CDN
CDN

Reputation: 410

Is there a way to reduce the number of tokens sent to chatgpt (as context)?

I'm using chatgpt's API to discuss book topics. In order for chatgpt to understand the whole story I had to add context.

This means that all user questions and chatgpt replies are sent with the same request. Thus very quickly reaching the maximum support token limit. and usage fees also increase rapidly.

Please show me a short way to reduce the amount of tokens sent, thereby reducing costs.

Below is the example I chatgpt request

enter image description here

Upvotes: 3

Views: 5949

Answers (3)

Hariprasath
Hariprasath

Reputation: 1

Use Langchain! It offers many features, such as Dataloaders, a vector database, and caching. In my view, store the data in a PDF or text file, then load and chunk it into smaller pieces. Using an embedding model, you can create retrieval QA models, and caching helps reduce token usage when repeated questions are asked.

Upvotes: -2

Huboh
Huboh

Reputation: 135

Simple and fast method is implementing your own solution by somehow recursively removing messages in the message array so that the amount of tokens you send (input/prompt tokens) + the amount of tokens you specified as the max_tokens(max completion tokens) is within a model’s tokens limit (4096 for gpt-3.5-turbo)

const max_tokens = 1000; // max response tokens from OpenAI
const modelTokenLimit = 4096; // gpt-3.5-turbo tokens limit

// ensure prompt tokens + max completion tokens from OpenAI is within model’s tokens limit
while (calcMessagesTokens(messages) > (modelTokenLimit - max_tokens)) {
      messages.splice(1, 1); // remove first message that comes after system message 
}

// send request to OpenAI

Upvotes: 1

CSE
CSE

Reputation: 31

I have 2 solutions

  1. try to learn Langchain . it will shorten the content you put in. However, I don't know Is it really reducing the token that is charged by chatgpt? https://js.langchain.com/docs/modules/chains/other_chains/summarization
  2. If a conversation cannot fit within the model’s token limit, it will need to be shortened in some way. This can be achieved by having a type of rolling log for conversational history, where only the last n amount of dialog turns are re-submitted.

Upvotes: 1

Related Questions