Reputation: 117
In the following scenario, what's your best approach using GPT-3 API?
Here is what I found to work well:
temperature
should be set to 0
, to make sure GPT-3 uses only facts from the data source.Example: Let's say I want to write a paragraph about Subject A, Subject B, and Subject C. And I have 5 articles as references. The open ai playground will look something like this:
Example Article 1
----
Subject A: example A for OPT-3
Subject B: n/a
Subject c: n/a
=========
Example Article 2
----
Subject A: n/a
Subject B: example B for GPT-3
Subject C: n/a
=========
Example Article 3
----
Subject A: n/a
Subject B: n/a
Subject c: example for GPT-3
=========
Article 1
-----
Subject A:
Subject B:
Subject C:
=========
... repeating with all articles, save to str
=========
str
-----
Subject A:
Subject B:
Subject C:
Upvotes: 1
Views: 501
Reputation: 151
Okay here is the approach that I've tried First I take all the articles and do some pre processing on them. This pre processing does remove some unwanted things in our article thus reducing our tokens.And then I would calculate the number of tokens in that string. I would suggest to keep a max token length of 3500 tokens even though the limit is 4097 because the tokens taken into account are the prompt, your content and also the response so 3500 would give you some buffer . And if the given strings token length exceeds 3500 I would split it into chunks and pass it to the open Api , (I would be careful to not pass these chunks inside a loop since there is cost involved)And generate a summary for each chunk and concat the generated summaries and pass it to the API to generate a final summary.(when splitting into chunks see to that split it where the last chunk does not have tokens less than 100 tokens for better accuracy)
Upvotes: 1
Reputation: 83177
One may use the Python library GPT Index (MIT license) to summarize a collection of documents. From the documentation:
index = GPTTreeIndex(documents) response = index.query("<summarization_query>", mode="summarize")
The “default” mode for a tree-based query is traversing from the top of the graph down to leaf nodes. For summarization purposes we will want to use
mode="summarize"
.A summarization query could look like one of the following:
- “What is a summary of this collection of text?”
- “Give me a summary of person X’s experience with the company.”
Upvotes: 1