RetrievalQA max token limit reached even though its short prompt

Question

I am building a simple llm model which uses vectorstore embedding from the text file. My prompt is very short, but whenever I request the answer from the model, I am getting a message that I have reached the max token limit.

loader = TextLoader(path + file_name)
document = loader.load()

# Document Split
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
document = text_splitter.split_documents(document)

# Vector db
vectordb = Chroma.from_documents(
  document,
  embedding=OpenAIEmbeddings(),
  persist_directory='/content/vectordb'
)
vectordb.persist()

# Set the chain
qa = RetrievalQA.from_chain_type(
    llm=OpenAI(),
    retriever=vectordb.as_retriever(search_kwargs={'k': 7}),
    return_source_documents=True
)

# Prompt
prompt = """
1-2 sentences of prompt
"""

# User query
user_query = "short query which the length is about this much"

result = qa({'query': prompt + user_query})
print(result['result'])

And I am keep getting this.

InvalidRequestError: This model's maximum context length is 4097 tokens, however you requested 5220 tokens (4964 in your prompt; 256 for the completion). Please reduce your prompt; or completion length.

Any helps would be appreciated!

RetrievalQA max token limit reached even though its short prompt

Answers (1)

Related Questions