Reputation: 43
I am building a simple llm model which uses vectorstore embedding from the text file. My prompt is very short, but whenever I request the answer from the model, I am getting a message that I have reached the max token limit.
loader = TextLoader(path + file_name)
document = loader.load()
# Document Split
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
document = text_splitter.split_documents(document)
# Vector db
vectordb = Chroma.from_documents(
document,
embedding=OpenAIEmbeddings(),
persist_directory='/content/vectordb'
)
vectordb.persist()
# Set the chain
qa = RetrievalQA.from_chain_type(
llm=OpenAI(),
retriever=vectordb.as_retriever(search_kwargs={'k': 7}),
return_source_documents=True
)
# Prompt
prompt = """
1-2 sentences of prompt
"""
# User query
user_query = "short query which the length is about this much"
result = qa({'query': prompt + user_query})
print(result['result'])
And I am keep getting this.
InvalidRequestError: This model's maximum context length is 4097 tokens, however you requested 5220 tokens (4964 in your prompt; 256 for the completion). Please reduce your prompt; or completion length.
Any helps would be appreciated!
Upvotes: 4
Views: 2276
Reputation: 99
Try the following:
llm = OpenAI(client=OpenAI, model='gpt-3.5-turbo', max_tokens=2048)
Upvotes: 0