vLLM + k8s on Bare Metal

Ollama on Ubuntu was an easy enough install, with the addition of OpenWebUI - we had an easy to use in-house tool - but we stopped short of voice chat, due to lack of TLS.
The next project is to use a production grade tool - vllm on kubernetes.
All of this is going to require our own DNS server - why haven't we done this already?
- DNS
- kubernetes on Ubuntu
- vllm
- https://docs.vllm.ai/en/latest/getting_started/quickstart/
- https://ploomber.io/blog/vllm-deploy/
- https://www.linkedin.com/posts/satyamallick_vllm-deploying-llms-at-scale-like-openai-activity-7397281270063542273-HPXm/
- https://github.com/vllm-project/vllm
Here's to the Thanksgiving break, which gives me time to dive in.
Update:
-
DNS - now running Technitium for the home network. Added ad blocking lists to clean things up. Running on nanostack server.
-
Microk8s running a 4 node cluster. 1 CP node, 3 worker nodes.
-
vLLM took a backseat over Turkey day, back at it now.
-
Added Ollama running as a service with Open Web UI on ubuntu - system76 server. Adding nginx proxy as a frontend so we can enable TLS for voice chats.
here's to more fun.
Comments
Be the first to comment.