GGML (llama cpp) models become dumb when used in python

Question

I am struggling with the issue of models not following instructions at all when they are used in Python, however, they work much better when they are used in a shell (like cmd, or powershell).

python examples:

Question: llm("Can you solve math questions?") Response: Can you solve these math questions?

Question: llm("what is (4.5*2.1)^2.2?") Response: Long text output omitted. It was just not related to question. It just asked more questions instead of answering the question.

I am trying to use it with langchain as llm for agent, however, models are acting too dumb. I should be able to get a correct answer of the following:

from langchain.agents import load_tools

tools = load_tools(
    ['llm-math'],
    llm=llm
)

from langchain.agents import initialize_agent

zero_shot_agent = initialize_agent(
    agent="zero-shot-react-description",
    tools=tools,
    llm=llm,
    verbose=True,
    max_iterations=3
)
zero_shot_agent("what is (4.5*2.1)^2.2?")

The response I get:

Entering new AgentExecutor chain...
Llama.generate: prefix-match hit
 let's get the calculator out!
Action: [Calculator]
Action Input: 4.5 and 2.1 as a ratio
Observation: [Calculator] is not a valid tool, try another one.
Thought:Llama.generate: prefix-match hit

omitting large output

OutputParserException: Could not parse LLM output: ` I will use the power rule for exponents to do this by hand.
Action: (4.5*2.1)^2.2 = 4.5*2.1^2.2`

Is there a way to overcome this problem, but I want to use GGML model (or any model that can be run on cpu locally). Model that I got this outputs as above is manticore 13b q4_0. (though I am sure that larger models i.e. more bits eg 5 or 8 would not be any better). Also, this kind of error (OutputParserException only occours when I use a notebook (ipynb or google colab) I usually encounter a different problem when the code is run in python REPL (through cmd or powershell). The problem I encounter when running code in REPL is that langchain just can't use my tools. For example for my quetion zero_shot_agent("what is (4.5*2.1)^2.2?") I get outputs like

 I should use the calculator for this math problem.
Action: [Calculator]
Action Input: press the equals button and type in 4.5 and 2.1, then press the square root button twice
Observation: [Calculator] is not a valid tool, try another one.

 I will use a regular calculator.
Action: [Regular Calculator]
Action Input: turn on the calculator and input the problem: (4.5*2.1)^2.2
Observation: [Regular Calculator] is not a valid tool, try another one.

 I will use my phone's calculator app.
Action: [Phone Calculator]
Action Input: open the app and input the problem: (4.5*2.1)^2.2
Observation: [Phone Calculator] is not a valid tool, try another one.
Thought:

> Finished chain.
{'input': 'what is (4.5*2.1)^2.2?', 'output': 'Agent stopped due to iteration limit or time limit.'}

Though it stopped at the third iteration (try) to solve the problem, however, I don't see any value for letting it run longer.

GGML (llama cpp) models become dumb when used in python

Answers (1)

Related Questions