Reputation: 1
I start llama cpp Python server with the command:
python -m llama_cpp.server --model D:\Mistral-7B-Instruct-v0.3.Q4_K_M.gguf --n_ctx 8192 --chat_format functionary
Then I run my Python script which looks like this:
from openai import OpenAI
import json
import requests
try:
client = OpenAI( base_url="http://localhost:8000/v1", api_key="sk-xxx")
response = client.chat.completions.create(
model="mistralai--Mistral-7B-Instruct-v0.3",
messages=[
{"role": "user", "content": "hi"},
],
)
# Extract the assistant's reply
response_message = response.choices[0].message
print(response_message)
except Exception as e:
error_msg = str(e)
print(f"Exception type: {type(e)}")
However, I don’t know how to set the top_k value to 1.
I tried changing my code to:
from openai import OpenAI
import json
import requests
try:
client = OpenAI( base_url="http://localhost:8000/v1", api_key="sk-xxx")
response = client.chat.completions.create(
model="mistralai--Mistral-7B-Instruct-v0.3",
messages=[
{"role": "user", "content": "hi"},
],
top_k=1
)
# Extract the assistant's reply
response_message = response.choices[0].message
print(response_message)
except Exception as e:
error_msg = str(e)
print(f"Exception type: {type(e)}")
Also tried adding top_k value when starting the server like this:
python -m llama_cpp.server --model D:\Mistral-7B-Instruct-v0.3.Q4_K_M.gguf —-top-k 1 --n_ctx 8192 --chat_format functionary
But doesn’t seem to work. Can anyone help?
Upvotes: 0
Views: 76