Host a Model with vllm for RAG

Question

I am struggling with an issue here trying to host an LLM using vLLM for performing RAG. When I start the vLLM server with minimal arguments, for example vllm serve --model NousResearch/Hermes-2-Pro-Llama-3-8B, the server starts, but when I send a request, I get an error saying that I should define a template.

If I try starting the server with more arguments based on the vLLM documentation: vllm serve --model mistralai/Mistral-7B-Instruct-v0.3 --chat-template examples/tool_chat_template_mistral.jinja --enable-auto-tool-choice --tool-call-parser mistral --gpu-memory-utilization=0.5, I get the error: vllm: error: unrecognized arguments: --enable-auto-tool-choice --tool-call-parser.

Can anyone provide any help or advice? Thanks in advance!

openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)
> 
chat_response = client.chat.completions.create(
    model="facebook/opt-125m",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Tell me a joke."},
    ]
)

print("Chat response:", chat_response)

Host a Model with vllm for RAG

Answers (1)

Related Questions