Reputation: 1889
I am making a request to the completions endpoint. My prompt is 1360 tokens, as verified by the Playground and the Tokenizer. I won't show the prompt as it's a little too long for this question.
Here is my request to openai in Nodejs using the openai npm package.
const response = await openai.createCompletion({
model: 'text-davinci-003',
prompt,
max_tokens: 4000,
temperature: 0.2
})
When testing in the playground my total tokens after response are 1374.
When submitting my prompt via the completions API I am getting the following error:
error: {
message: "This model's maximum context length is 4097 tokens, however you requested 5360 tokens (1360 in your prompt; 4000 for the completion). Please reduce your prompt; or completion length.",
type: 'invalid_request_error',
param: null,
code: null
}
If you have been able to solve this one, I'd love to hear how you did it.
Upvotes: 43
Views: 108171
Reputation: 22880
The max_tokens
parameter is shared between the prompt and the completion. Tokens from the prompt and the completion all together should not exceed the token limit of a particular OpenAI model.
As stated in the official OpenAI article:
Depending on the model used, requests can use up to
4097
tokens shared between prompt and completion. If your prompt is4000
tokens, your completion can be97
tokens at most.The limit is currently a technical limitation, but there are often creative ways to solve problems within the limit, e.g. condensing your prompt, breaking the text into smaller pieces, etc.
Note: For counting tokens before(!) sending an API request, see this answer.
LATEST MODEL | DESCRIPTION | MAX TOKENS | TRAINING DATA |
---|---|---|---|
gpt-4-1106-preview |
GPT-4 Turbo The latest GPT-4 model with improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more. Returns a maximum of 4,096 output tokens. This preview model is not yet suited for production traffic. Learn more. |
128,000 tokens | Up to Apr 2023 |
gpt-4-vision-preview |
GPT-4 Turbo with vision Ability to understand images, in addition to all other GPT-4 Turbo capabilties. Returns a maximum of 4,096 output tokens. This is a preview model version and not suited yet for production traffic. Learn more. |
128,000 tokens | Up to Apr 2023 |
gpt-4 |
Currently points to gpt-4-0613 . See continuous model upgrades. |
8,192 tokens | Up to Sep 2021 |
gpt-4-0613 |
Snapshot of gpt-4 from June 13th 2023 with improved function calling support. |
8,192 tokens | Up to Sep 2021 |
gpt-4-32k |
Currently points to gpt-4-32k-0613 . See continuous model upgrades. |
32,768 tokens | Up to Sep 2021 |
gpt-4-32k-0613 |
Snapshot of gpt-4-32k from June 13th 2023 with improved function calling support. |
32,768 tokens | Up to Sep 2021 |
gpt-4-0314 (Legacy) |
Snapshot of gpt-4 from March 14th 2023 with function calling support. This model version will be deprecated on June 13th 2024. |
8,192 tokens | Up to Sep 2021 |
gpt-4-32k-0314 (Legacy) |
Snapshot of gpt-4-32k from March 14th 2023 with function calling support. This model version will be deprecated on June 13th 2024. |
32,768 tokens | Up to Sep 2021 |
LATEST MODEL | DESCRIPTION | MAX TOKENS | TRAINING DATA |
---|---|---|---|
gpt-3.5-turbo-1106 |
Updated GPT 3.5 Turbo The latest GPT-3.5 Turbo model with improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more. Returns a maximum of 4,096 output tokens. Learn more. |
16,385 tokens | Up to Sep 2021 |
gpt-3.5-turbo |
Currently points to gpt-3.5-turbo-0613 . Will point to gpt-3.5-turbo-1106 starting Dec 11, 2023. See continuous model upgrades. |
4,096 tokens | Up to Sep 2021 |
gpt-3.5-turbo-16k |
Currently points to gpt-3.5-turbo-0613 . Will point to gpt-3.5-turbo-1106 starting Dec 11, 2023. See continuous model upgrades. |
16,385 tokens | Up to Sep 2021 |
gpt-3.5-turbo-instruct |
Similar capabilities as text-davinci-003 but compatible with legacy Completions endpoint and not Chat Completions. |
4,096 tokens | Up to Sep 2021 |
gpt-3.5-turbo-0613 (Legacy) |
Snapshot of gpt-3.5-turbo from June 13th 2023. Will be deprecated on June 13, 2024. |
4,096 tokens | Up to Sep 2021 |
gpt-3.5-turbo-16k-0613 (Legacy) |
Snapshot of gpt-3.5-16k-turbo from June 13th 2023. Will be deprecated on June 13, 2024. |
16,385 tokens | Up to Sep 2021 |
gpt-3.5-turbo-0301 (Legacy) |
Snapshot of gpt-3.5-turbo from March 1st 2023. Will be deprecated on June 13th 2024. |
4,096 tokens | Up to Sep 2021 |
LATEST MODEL | DESCRIPTION | MAX TOKENS | TRAINING DATA |
---|---|---|---|
text-curie-001 |
Very capable, faster and lower cost than Davinci. | 2,049 tokens | Up to Oct 2019 |
text-babbage-001 |
Capable of straightforward tasks, very fast, and lower cost. | 2,049 tokens | Up to Oct 2019 |
text-ada-001 |
Capable of very simple tasks, usually the fastest model in the GPT-3 series, and lowest cost. | 2,049 tokens | Up to Oct 2019 |
davinci |
Most capable GPT-3 model. Can do any task the other models can do, often with higher quality. | 2,049 tokens | Up to Oct 2019 |
curie |
Very capable, but faster and lower cost than Davinci. | 2,049 tokens | Up to Oct 2019 |
babbage |
Capable of straightforward tasks, very fast, and lower cost. | 2,049 tokens | Up to Oct 2019 |
ada |
Capable of very simple tasks, usually the fastest model in the GPT-3 series, and lowest cost. | 2,049 tokens | Up to Oct 2019 |
LATEST MODEL | DESCRIPTION | MAX TOKENS | TRAINING DATA |
---|---|---|---|
babbage-002 |
Replacement for the GPT-3 ada and babbage base models. |
16,384 tokens | Up to Sep 2021 |
davinci-002 |
Replacement for the GPT-3 curie and davinci base models. |
16,384 tokens | Up to Sep 2021 |
Upvotes: 54
Reputation: 7488
An important note for gpt-3.5-turbo
and gpt-4
users, as per documentation:
ChatGPT models like gpt-3.5-turbo and gpt-4 use tokens in the same way as older completions models, but because of their message-based formatting, it's more difficult to count how many tokens will be used by a conversation.
Please, refer to OpenAI CookBook for examples of how to deal with this if you're receiving this error because of wrongly calculated tokens. Also, the official docs with an example.
Upvotes: 2
Reputation: 1889
This was solved by Reddit user 'bortlip'.
The max_tokens
parameter defines the response tokens.
From OpenAI:
https://platform.openai.com/docs/api-reference/completions/create#completions/create-max_tokens
The token count of your prompt plus max_tokens cannot exceed the model's context length.
Therefore to solve the issue I subtract the token count of the prompt from the max_tokens
and it works just fine.
Upvotes: 5