Brendan Hill
Brendan Hill

Reputation: 3742

How to send longer text inputs to ChatGPT API?

We have a use case for ChatGPT in summarizing long pieces of text (speech-to-text conversations which can be over an hour).

However we find that the 4k token limit tends to lead to a truncation of the input text to say half or so due to the token limit.

Processing in parts does not seem to retain history of previous parts.

What options do we have for submitting a longer request which is over 4k tokens?

Upvotes: 28

Views: 46637

Answers (5)

Dennis
Dennis

Reputation: 78

I use long inputs, so I made a tool for myself. You can find it on my GitHub. I built for myself and it serves me well. https://github.com/LearnFL/proj-python-chat-gpt-interface

The page explains how to use the script.

You specify how you want to split prompt by providing the length of desired input length expressed in tokens. Variable task holds instruction for the Chat Gpt on what you want from it. It will be pre-appended to each batch of text so the model could extract or do what you need done.

prompt = """A VERY LONG TEXT ON HOW TO USE          REGULAR EXPRESSIONS..."""
res = OpenAIAPI.generate(
 prompt, task="Explain how to use re",    get='batches', method="chat", model="gpt-    3.5-   turbo-1106", token_size=4000)
print(res) 

Upvotes: 0

RonanOD
RonanOD

Reputation: 885

Another option is the ChatGPT retrieval plugin. This allows for creation of a vector database of your document's text which can be then processed by the LLM. See https://github.com/openai/chatgpt-retrieval-plugin

Upvotes: 4

Chiyu Song
Chiyu Song

Reputation: 1

One approach to handle long text is to divide it into smaller fragments, retrieve the appropriate pieces according to your task, and then send them through an API call.

Here's a project that is capable of processing PDFs, txt and doc files, as well as web pages. It allows you to converse with the document. In your case, you could ask a general question like "what is the document about" to receive a summary, and then inquire for more specific details.

Upvotes: 0

Masoud Gheisari
Masoud Gheisari

Reputation: 1497

You can use GPT-4 for long contexts

As stated in OpenAI:

GPT-4 is capable of handling over 25,000 words of text, allowing for use cases like long form content creation, extended conversations, and document search and analysis.

Upvotes: -2

Aklelka
Aklelka

Reputation: 251

The closest answer to your question would be in the form of Embeddings.

You can find an overview of what they are here.

I recommend you review this code from the OpenAI Cookbook Github page that used a Web Crawl Q&A example to explain embeddings.

I used the code from Step 5 onwards and altered the location of the text to poin it to my file containing the long piece of text.

From:

# Open the file and read the text
with open("text/" + domain + "/" + file, "r", encoding="UTF-8") as f:
    text = f.read()

to:

# Open the file and read the text
with open("/my_location/long_text_file.txt", "r", encoding="UTF-8") as f:
    text = f.read()

And modified the questions at Step 13 to what I needed to know about the text.

Upvotes: 8

Related Questions