Laz
Laz

Reputation: 1

Host a Model with vllm for RAG

I am struggling with an issue here trying to host an LLM using vLLM for performing RAG. When I start the vLLM server with minimal arguments, for example vllm serve --model NousResearch/Hermes-2-Pro-Llama-3-8B, the server starts, but when I send a request, I get an error saying that I should define a template.

If I try starting the server with more arguments based on the vLLM documentation: vllm serve --model mistralai/Mistral-7B-Instruct-v0.3 --chat-template examples/tool_chat_template_mistral.jinja --enable-auto-tool-choice --tool-call-parser mistral --gpu-memory-utilization=0.5, I get the error: vllm: error: unrecognized arguments: --enable-auto-tool-choice --tool-call-parser.

Can anyone provide any help or advice? Thanks in advance!

openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)
> 
chat_response = client.chat.completions.create(
    model="facebook/opt-125m",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Tell me a joke."},
    ]
)

print("Chat response:", chat_response)

Upvotes: 0

Views: 610

Answers (1)

happy
happy

Reputation: 525

  1. After serving llm model:
python -m vllm.entrypoints.openai.api_server --model "<path-to-model>" --tensor-parallel-size 2 --gpu-memory-utilization 0.6 --host 0.0.0.0 --port 9001

or

vllm serve "<Path-to-model>" --tensor-parallel-size 2 --gpu-memory-utilization 0.6 --host 0.0.0.0 --port 9001

<Path-to-model> = './Vistral-7B-Chat'

  1. you can connect by using VLLMOpenAI
from langchain_community.llms.vllm import VLLMOpenAI

llm = VLLMOpenAI(openai_api_key="EMPTY",
                openai_api_base=OPEN_API_BASE,
                model_name=MODEL_NAME,
                max_tokens=1024,
                temperature=0.1,
                top_p= 1,
                frequency_penalty= 0,
                model_kwargs={"stop": ["```","<|eot_id|>","<End>"]}

example:

  • OPEN_API_BASE=http://localhost:9001/v1/
  • MODEL_NAME='./Vistral-7B-Chat' (path-model)

Upvotes: 0

Related Questions