Reputation: 410
I'm using chatgpt's API to discuss book topics. In order for chatgpt to understand the whole story I had to add context.
This means that all user questions and chatgpt replies are sent with the same request. Thus very quickly reaching the maximum support token limit. and usage fees also increase rapidly.
Please show me a short way to reduce the amount of tokens sent, thereby reducing costs.
Below is the example I chatgpt request
Upvotes: 3
Views: 5949
Reputation: 1
Use Langchain! It offers many features, such as Dataloaders, a vector database, and caching. In my view, store the data in a PDF or text file, then load and chunk it into smaller pieces. Using an embedding model, you can create retrieval QA models, and caching helps reduce token usage when repeated questions are asked.
Upvotes: -2
Reputation: 135
Simple and fast method is implementing your own solution by somehow recursively removing messages in the message array so that the amount of tokens you send (input/prompt tokens) + the amount of tokens you specified as the max_tokens
(max completion tokens) is within a model’s tokens limit (4096 for gpt-3.5-turbo)
const max_tokens = 1000; // max response tokens from OpenAI
const modelTokenLimit = 4096; // gpt-3.5-turbo tokens limit
// ensure prompt tokens + max completion tokens from OpenAI is within model’s tokens limit
while (calcMessagesTokens(messages) > (modelTokenLimit - max_tokens)) {
messages.splice(1, 1); // remove first message that comes after system message
}
// send request to OpenAI
Upvotes: 1
Reputation: 31
I have 2 solutions
Upvotes: 1