Unable to set top_k value in Llama cpp Python server

Question

I start llama cpp Python server with the command:

python -m llama_cpp.server --model D:\Mistral-7B-Instruct-v0.3.Q4_K_M.gguf --n_ctx 8192 --chat_format functionary

Then I run my Python script which looks like this:

from openai import OpenAI
import json
import requests

try:
    client = OpenAI( base_url="http://localhost:8000/v1", api_key="sk-xxx")
    response = client.chat.completions.create(
        model="mistralai--Mistral-7B-Instruct-v0.3",
        messages=[
            {"role": "user", "content": "hi"},
        ],


    )

    # Extract the assistant's reply
    response_message = response.choices[0].message
    print(response_message)
  
except Exception as e:
    error_msg = str(e)
    print(f"Exception type: {type(e)}")

However, I don’t know how to set the top_k value to 1.

I tried changing my code to:

from openai import OpenAI
import json
import requests


try:
    client = OpenAI( base_url="http://localhost:8000/v1", api_key="sk-xxx")
    response = client.chat.completions.create(
        model="mistralai--Mistral-7B-Instruct-v0.3",
        messages=[
            {"role": "user", "content": "hi"},
        ],
        top_k=1


    )

    # Extract the assistant's reply
    response_message = response.choices[0].message
    print(response_message)
  
except Exception as e:
    error_msg = str(e)
    print(f"Exception type: {type(e)}")

Also tried adding top_k value when starting the server like this:

python -m llama_cpp.server --model D:\Mistral-7B-Instruct-v0.3.Q4_K_M.gguf —-top-k 1 --n_ctx 8192 --chat_format functionary

But doesn’t seem to work. Can anyone help?

Unable to set top_k value in Llama cpp Python server

Answers (0)

Related Questions