Reputation: 555
I have downloaded shards of gemma2
model from hugging face and then converted them into gguf
format via the script from llamacpp
repository. Then i tried to run my local gemma2
via llamacpp
in the following way:
from llama_cpp import Llama
llm = Llama(
model_path="/home/s1ngle/.cache/huggingface/hub/models--google--gemma-2-2b-it/snapshots/299a8560bedf22ed1c72a8a11e7dce4a7f9f51f8/299a8560bedf22ed1c72a8a11e7dce4a7f9f51f8-2.6B-299a8560bedf22ed1c72a8a11e7dce4a7f9f51f8-BF16.gguf",
n_gpu_layers=0,
n_threads=8,
n_batch=8,
n_ctx=8192,
seed=-1,
f16_kv=True,
verbose=False,
cache=False,
last_n_tokens_size=64,
)
output = llm(
"Hi! I'm Bob",
max_tokens=128,
echo=False,
temperature=0,
top_k=10,
top_p=0.95,
)
print(output)
and i got the following result in the console:
{'id': 'cmpl-21b4cd4a-58e5-4e76-9782-809e3ef0a731', 'object': 'text_completion', 'created': 1726382333, 'model': '/home/s1ngle/.cache/huggingface/hub/models--google--gemma-2-2b-it/snapshots/299a8560bedf22ed1c72a8a11e7dce4a7f9f51f8/299a8560bedf22ed1c72a8a11e7dce4a7f9f51f8-2.6B-299a8560bedf22ed1c72a8a11e7dce4a7f9f51f8-BF16.gguf', 'choices': [{'text': ', a friendly AI assistant. 👋 \n\nHow can I help you today? 😊 \n', 'index': 0, 'logprobs': None, 'finish_reason': 'stop'}], 'usage': {'prompt_tokens': 7, 'completion_tokens': 19, 'total_tokens': 26}}
As you can see my prompt has been added with , a friendly AI assistant.
phrase and for some reason this added phrase is printed out in the llm's output.
Why? How can i prevent it?
Upvotes: 0
Views: 64