Reputation: 1504
OpenAI's text models have a context length, e.g.: Curie has a context length of 2049 tokens.
They provide max_tokens
and stop
parameters to control the length of the generated sequence. Therefore the generation stops either when stop token is obtained, or max_tokens
is reached.
The issue is: when generating a text, I don't know how many tokens my prompt contains. Since I do not know that, I cannot set max_tokens = 2049 - number_tokens_in_prompt
.
This prevents me from generating text dynamically for a wide range of text in terms of their length. What I need is to continue generating until the stop token.
My questions are:
max_tokens
parameter accordingly?max_tokens
to the max cap so that I won't need to count the number of prompt tokens?Upvotes: 86
Views: 96676
Reputation: 22930
As stated in the official OpenAI article:
To further explore tokenization, you can use our interactive Tokenizer tool, which allows you to calculate the number of tokens and see how text is broken into tokens. Alternatively, if you'd like to tokenize text programmatically, use tiktoken as a fast BPE tokenizer specifically used for OpenAI models.
A tokenizer can split the text string into a list of tokens, as stated in the official OpenAI example on counting tokens with tiktoken:
tiktoken is a fast open-source tokenizer by OpenAI.
Given a text string (e.g.,
"tiktoken is great!"
) and an encoding (e.g.,"cl100k_base"
), a tokenizer can split the text string into a list of tokens (e.g.,["t", "ik", "token", " is", " great", "!"]
).Splitting text strings into tokens is useful because GPT models see text in the form of tokens. Knowing how many tokens are in a text string can tell you:
- whether the string is too long for a text model to process and
- how much an OpenAI API call costs (as usage is priced by token).
As of April 2024, tiktoken supports 2 encodings used by OpenAI models (source 1, source 2):
Encoding name | OpenAI models |
---|---|
o200k_base |
• GPT-4o models (gpt-4o ) |
cl100k_base |
• GPT-4 models (gpt-4 )• GPT-3.5 Turbo models ( gpt-3.5-turbo )• GPT Base models ( davinci-002 , babbage-002 )• Embeddings models ( text-embedding-ada-002 , text-embedding-3-large , text-embedding-3-small )• Fine-tuned models ( ft:gpt-4 , ft:gpt-3.5-turbo , ft:davinci-002 , ft:babbage-002 ) |
Note: The p50k_base
and r50k_base
encodings were used for models that are deprecated as of April 2024.
Official OpenAI libraries:
3rd-party libraries:
pip install --upgrade tiktoken
OPTION 1: Search in the table above for the correct encoding for a given OpenAI model
If you run get_tokens_1.py
, you'll get the following output:
9
get_tokens_1.py
import tiktoken
def num_tokens_from_string(string: str, encoding_name: str) -> int:
encoding = tiktoken.get_encoding(encoding_name)
num_tokens = len(encoding.encode(string))
return num_tokens
print(num_tokens_from_string("Hello world, let's test tiktoken.", "cl100k_base"))
OPTION 2: Use tiktoken.encoding_for_model()
to automatically load the correct encoding for a given OpenAI model
If you run get_tokens_2.py
, you'll get the following output:
9
get_tokens_2.py
import tiktoken
def num_tokens_from_string(string: str, encoding_name: str) -> int:
encoding = tiktoken.encoding_for_model(encoding_name)
num_tokens = len(encoding.encode(string))
return num_tokens
print(num_tokens_from_string("Hello world, let's test tiktoken.", "gpt-3.5-turbo"))
Note: If you take a careful look at the usage field in the OpenAI API response, you'll see that it reports 10 tokens used for an identical message. That's 1 token more than tiktoken. I still haven't figured out why. I tested this in the past. As @Jota mentioned in the comment below, there still seems to be a mismatch between the token usage reported by the OpenAI API response and tiktoken.
Upvotes: 115
Reputation: 327
If you want to count the tokens used by a Chat Completion API request, which has other metadata like role
and name
in addition to the raw prompt (content
), then see the excerpts below from OpenAI's cookbook.
Source Code
def num_tokens_from_messages(messages, model="gpt-3.5-turbo-0613"):
"""Return the number of tokens used by a list of messages."""
try:
encoding = tiktoken.encoding_for_model(model)
except KeyError:
print("Warning: model not found. Using cl100k_base encoding.")
encoding = tiktoken.get_encoding("cl100k_base")
if model in {
"gpt-3.5-turbo-0613",
"gpt-3.5-turbo-16k-0613",
"gpt-4-0314",
"gpt-4-32k-0314",
"gpt-4-0613",
"gpt-4-32k-0613",
}:
tokens_per_message = 3
tokens_per_name = 1
elif model == "gpt-3.5-turbo-0301":
tokens_per_message = 4 # every message follows <|start|>{role/name}\n{content}<|end|>\n
tokens_per_name = -1 # if there's a name, the role is omitted
elif "gpt-3.5-turbo" in model:
print("Warning: gpt-3.5-turbo may update over time. Returning num tokens assuming gpt-3.5-turbo-0613.")
return num_tokens_from_messages(messages, model="gpt-3.5-turbo-0613")
elif "gpt-4" in model:
print("Warning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613.")
return num_tokens_from_messages(messages, model="gpt-4-0613")
else:
raise NotImplementedError(
f"""num_tokens_from_messages() is not implemented for model {model}. See https://github.com/openai/openai-python/blob/main/chatml.md for information on how messages are converted to tokens."""
)
num_tokens = 0
for message in messages:
num_tokens += tokens_per_message
for key, value in message.items():
num_tokens += len(encoding.encode(value))
if key == "name":
num_tokens += tokens_per_name
num_tokens += 3 # every reply is primed with <|start|>assistant<|message|>
return num_tokens
Usage
# let's verify the function above matches the OpenAI API response
from openai import OpenAI
import os
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY", "<your OpenAI API key if not set as env var>"))
example_messages = [
{
"role": "system",
"content": "You are a helpful, pattern-following assistant that translates corporate jargon into plain English.",
},
{
"role": "system",
"name": "example_user",
"content": "New synergies will help drive top-line growth.",
},
{
"role": "system",
"name": "example_assistant",
"content": "Things working well together will increase revenue.",
},
{
"role": "system",
"name": "example_user",
"content": "Let's circle back when we have more bandwidth to touch base on opportunities for increased leverage.",
},
{
"role": "system",
"name": "example_assistant",
"content": "Let's talk later when we're less busy about how to do better.",
},
{
"role": "user",
"content": "This late pivot means we don't have time to boil the ocean for the client deliverable.",
},
]
for model in [
"gpt-3.5-turbo-0301",
"gpt-3.5-turbo-0613",
"gpt-3.5-turbo",
"gpt-4-0314",
"gpt-4-0613",
"gpt-4",
]:
print(model)
# example token count from the function defined above
print(f"{num_tokens_from_messages(example_messages, model)} prompt tokens counted by num_tokens_from_messages().")
# example token count from the OpenAI API
response = client.chat.completions.create(model=model,
messages=example_messages,
temperature=0,
max_tokens=1)
print(f'{response.usage.prompt_tokens} prompt tokens counted by the OpenAI API.')
print()
Output
127 prompt tokens counted by num_tokens_from_messages().
127 prompt tokens counted by the OpenAI API.
gpt-3.5-turbo-0613
129 prompt tokens counted by num_tokens_from_messages().
129 prompt tokens counted by the OpenAI API.
gpt-3.5-turbo
Warning: gpt-3.5-turbo may update over time. Returning num tokens assuming gpt-3.5-turbo-0613.
129 prompt tokens counted by num_tokens_from_messages().
129 prompt tokens counted by the OpenAI API.
gpt-4-0314
129 prompt tokens counted by num_tokens_from_messages().
129 prompt tokens counted by the OpenAI API.
gpt-4-0613
129 prompt tokens counted by num_tokens_from_messages().
129 prompt tokens counted by the OpenAI API.
gpt-4
Warning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613.
129 prompt tokens counted by num_tokens_from_messages().
129 prompt tokens counted by the OpenAI API.
Important note from the Cookbook:
Note that the exact way that tokens are counted from messages may change from model to model. Consider the counts from the function below an estimate, not a timeless guarantee.
In the Chat Completion API request, max_tokens
represents the maximum tokens for the generated output. To simplify the process of setting max_tokens
, you can make a function:
def max_tokens(messages, model):
input_tokens = num_tokens_from_messages(messages, model=model)
context_length = get_context_length(model)
return context_length - input_tokens
def get_context_length(model):
if model == "gpt-3.5-turbo-0613":
return 4096
# Add additional model context windows here.
else:
raise ValueError(f"No context length known for model: {model}")
Upvotes: 3
Reputation: 2705
Here is how I do it with Python 3. Then you can pass the model name or the encoding string. You can get the encoding, the tokens or the token count.
token_helper.py:
import tiktoken
def encoding_getter(encoding_type: str):
"""
Returns the appropriate encoding based on the given encoding type (either an encoding string or a model name).
"""
if "k_base" in encoding_type:
return tiktoken.get_encoding(encoding_type)
else:
return tiktoken.encoding_for_model(encoding_type)
def tokenizer(string: str, encoding_type: str) -> list:
"""
Returns the tokens in a text string using the specified encoding.
"""
encoding = encoding_getter(encoding_type)
tokens = encoding.encode(string)
return tokens
def token_counter(string: str, encoding_type: str) -> int:
"""
Returns the number of tokens in a text string using the specified encoding.
"""
num_tokens = len(tokenizer(string, encoding_type))
return num_tokens
Works like this
>>> import token_helper
>>> token_helper.token_counter("This string will be counted as tokens", "gpt-3.5-turbo"))
7
Upvotes: 3
Reputation: 21
With the information contained in the comments, I made this: https://gist.github.com/buanzo/7cdd2c34fc0bb25c71b857a16853c6fa
It is a count_tokens implementation that tries tiktoken, nltk and fallbacks to .split()
It includes a simple TokenBuffer implementation as well.
We can import the count_tokens function from the token_counter module and call it with our text string as follows:
from token_counter import count_tokens
text = "The quick brown fox jumps over the lazy dog."
result = count_tokens(text, debug=True)
print(result)
If all the required libraries are available the result is better but even without tiktoken nor nltk, the function should return a dictionary with the number of tokens and the method used to count them. For example:
{'n_tokens': 9, 'method': 'tiktoken'}
Upvotes: 1