zizon
zizon

Reputation: 43

RetrievalQA max token limit reached even though its short prompt

I am building a simple llm model which uses vectorstore embedding from the text file. My prompt is very short, but whenever I request the answer from the model, I am getting a message that I have reached the max token limit.

loader = TextLoader(path + file_name)
document = loader.load()

# Document Split
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
document = text_splitter.split_documents(document)

# Vector db
vectordb = Chroma.from_documents(
  document,
  embedding=OpenAIEmbeddings(),
  persist_directory='/content/vectordb'
)
vectordb.persist()

# Set the chain
qa = RetrievalQA.from_chain_type(
    llm=OpenAI(),
    retriever=vectordb.as_retriever(search_kwargs={'k': 7}),
    return_source_documents=True
)

# Prompt
prompt = """
1-2 sentences of prompt
"""

# User query
user_query = "short query which the length is about this much"

result = qa({'query': prompt + user_query})
print(result['result'])


And I am keep getting this.

InvalidRequestError: This model's maximum context length is 4097 tokens, however you requested 5220 tokens (4964 in your prompt; 256 for the completion). Please reduce your prompt; or completion length.

Any helps would be appreciated!

Upvotes: 4

Views: 2276

Answers (1)

fabmeyer
fabmeyer

Reputation: 99

Try the following:

  • Which model are you using? They have different max. tokens as can be seen here: https://platform.openai.com/docs/models
  • Also you can set a max_tokens parameter in LangChain: llm = OpenAI(client=OpenAI, model='gpt-3.5-turbo', max_tokens=2048)

Upvotes: 0

Related Questions