Reputation: 3742
We have a use case for ChatGPT in summarizing long pieces of text (speech-to-text conversations which can be over an hour).
However we find that the 4k token limit tends to lead to a truncation of the input text to say half or so due to the token limit.
Processing in parts does not seem to retain history of previous parts.
What options do we have for submitting a longer request which is over 4k tokens?
Upvotes: 28
Views: 46637
Reputation: 78
I use long inputs, so I made a tool for myself. You can find it on my GitHub. I built for myself and it serves me well. https://github.com/LearnFL/proj-python-chat-gpt-interface
The page explains how to use the script.
You specify how you want to split prompt by providing the length of desired input length expressed in tokens. Variable task holds instruction for the Chat Gpt on what you want from it. It will be pre-appended to each batch of text so the model could extract or do what you need done.
prompt = """A VERY LONG TEXT ON HOW TO USE REGULAR EXPRESSIONS..."""
res = OpenAIAPI.generate(
prompt, task="Explain how to use re", get='batches', method="chat", model="gpt- 3.5- turbo-1106", token_size=4000)
print(res)
Upvotes: 0
Reputation: 885
Another option is the ChatGPT retrieval plugin. This allows for creation of a vector database of your document's text which can be then processed by the LLM. See https://github.com/openai/chatgpt-retrieval-plugin
Upvotes: 4
Reputation: 1
One approach to handle long text is to divide it into smaller fragments, retrieve the appropriate pieces according to your task, and then send them through an API call.
Here's a project that is capable of processing PDFs, txt and doc files, as well as web pages. It allows you to converse with the document. In your case, you could ask a general question like "what is the document about" to receive a summary, and then inquire for more specific details.
Upvotes: 0
Reputation: 1497
You can use GPT-4 for long contexts
As stated in OpenAI:
GPT-4 is capable of handling over 25,000 words of text, allowing for use cases like long form content creation, extended conversations, and document search and analysis.
Upvotes: -2
Reputation: 251
The closest answer to your question would be in the form of Embeddings.
You can find an overview of what they are here.
I recommend you review this code from the OpenAI Cookbook Github page that used a Web Crawl Q&A example to explain embeddings.
I used the code from Step 5 onwards and altered the location of the text to poin it to my file containing the long piece of text.
From:
# Open the file and read the text
with open("text/" + domain + "/" + file, "r", encoding="UTF-8") as f:
text = f.read()
to:
# Open the file and read the text
with open("/my_location/long_text_file.txt", "r", encoding="UTF-8") as f:
text = f.read()
And modified the questions at Step 13 to what I needed to know about the text.
Upvotes: 8