Reputation: 603

Longer Responses with LlamaIndex?

I currently have LlamaIndex functioning off some private data just fine, however, it only outputs about 1000 characters worth. How do I extend the output until its completion? I know I can bump the tokens a bit, but I'm looking at potentially pages worth.

llm_predictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="text-davinci-003"))
max_input_size = 4096
num_output = 100
max_chunk_overlap = 20
chunk_size_limit = 600
prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap, chunk_size_limit=chunk_size_limit)
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper)
storage_context = StorageContext.from_defaults(persist_dir="./storage")

documents = []
documents += SimpleDirectoryReader('dataDir1').load_data()
documents += SimpleDirectoryReader('dataDir2').load_data()

index = GPTVectorStoreIndex.from_documents(documents, storage_context=storage_context, service_context=service_context)
storage_context.persist()
query_engine = index.as_query_engine()
resp = query_engine.query("Write a policy that is compliant with XYZ.")
print(resp)

Upvotes: 0

Answers (2)

belcro_d5

Reputation: 1

I am also facing the same issue. I don't know the solution for LlamaIndex, but for the api/v1/chat/completion, if the response is such that data["choices"][0]["finish_reason"] == "length", you can achieve continuation by sending exactly the same messages. Please correct any parts where the context seems off.

Upvotes: 0

Jon

Reputation: 603

The answer is, unsurprisingly, similar to generating longer text with the OpenAI module. Modifying the original code snippet to the following (The changes start at the query_engine line):

llm_predictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="text-davinci-003"))
max_input_size = 4096
num_output = 100
max_chunk_overlap = 20
chunk_size_limit = 600
prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap, chunk_size_limit=chunk_size_limit)
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper)
storage_context = StorageContext.from_defaults(persist_dir="./storage")

documents = []
documents += SimpleDirectoryReader('dataDir1').load_data()
documents += SimpleDirectoryReader('dataDir2').load_data()

index = GPTVectorStoreIndex.from_documents(documents, storage_context=storage_context, service_context=service_context)
storage_context.persist()

query_engine = index.as_query_engine()
qq = "Write a lengthy response to this query"
fullResponse = ''
while True:
    resp = query_engine.query(qq + '\n\n' + fullResponse)
    if resp.response != "Empty Response":
        fullResponse += (" " + resp.response)
    else:
        break
print("\n\n================================\n\n" + fullResponse)

In short, you take your query and add the response, then ask again. And again, and again. If all goes well, eventually the response object will comes back with the exact string "Empty Response" (Why a string that says that, rather than a string of length zero... I don't know. It's very silly), and you're done.

Be aware that there is an input limitation. In my testing... somewhere around 12k characters (or about 10 pages printed), the input request will be over the maximum token length that OpenAI allows. Additionally, sometimes OpenAI will just get itself into a loop and re-complete the same fragments over and over. Also, this is a quick way to generate a decent bill with OpenAI if you're not careful - a half dozen different tests rang up about $10 in API calls.

Upvotes: 0

Longer Responses with LlamaIndex?

Answers (2)

Related Questions