Reputation: 1
I am struggling with an issue here trying to host an LLM using vLLM for performing RAG. When I start the vLLM server with minimal arguments, for example vllm serve --model NousResearch/Hermes-2-Pro-Llama-3-8B, the server starts, but when I send a request, I get an error saying that I should define a template.
If I try starting the server with more arguments based on the vLLM documentation: vllm serve --model mistralai/Mistral-7B-Instruct-v0.3 --chat-template examples/tool_chat_template_mistral.jinja --enable-auto-tool-choice --tool-call-parser mistral --gpu-memory-utilization=0.5, I get the error: vllm: error: unrecognized arguments: --enable-auto-tool-choice --tool-call-parser.
Can anyone provide any help or advice? Thanks in advance!
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"
client = OpenAI(
api_key=openai_api_key,
base_url=openai_api_base,
)
>
chat_response = client.chat.completions.create(
model="facebook/opt-125m",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Tell me a joke."},
]
)
print("Chat response:", chat_response)
Upvotes: 0
Views: 610
Reputation: 525
python -m vllm.entrypoints.openai.api_server --model "<path-to-model>" --tensor-parallel-size 2 --gpu-memory-utilization 0.6 --host 0.0.0.0 --port 9001
or
vllm serve "<Path-to-model>" --tensor-parallel-size 2 --gpu-memory-utilization 0.6 --host 0.0.0.0 --port 9001
<Path-to-model> = './Vistral-7B-Chat'
from langchain_community.llms.vllm import VLLMOpenAI
llm = VLLMOpenAI(openai_api_key="EMPTY",
openai_api_base=OPEN_API_BASE,
model_name=MODEL_NAME,
max_tokens=1024,
temperature=0.1,
top_p= 1,
frequency_penalty= 0,
model_kwargs={"stop": ["```","<|eot_id|>","<End>"]}
example:
Upvotes: 0