Reputation: 1504

OpenAI API: How do I count tokens before(!) I send an API request?

OpenAI's text models have a context length, e.g.: Curie has a context length of 2049 tokens.

They provide max_tokens and stop parameters to control the length of the generated sequence. Therefore the generation stops either when stop token is obtained, or max_tokens is reached.

The issue is: when generating a text, I don't know how many tokens my prompt contains. Since I do not know that, I cannot set max_tokens = 2049 - number_tokens_in_prompt.

This prevents me from generating text dynamically for a wide range of text in terms of their length. What I need is to continue generating until the stop token.

My questions are:

How can I count the number of tokens in Python API so that I will set max_tokens parameter accordingly?
Is there a way to set max_tokens to the max cap so that I won't need to count the number of prompt tokens?

Upvotes: 86

Answers (4)

Rok Benko

Reputation: 23138

How do I count tokens before(!) I send an API request?

As stated in the official OpenAI article:

To further explore tokenization, you can use our interactive Tokenizer tool, which allows you to calculate the number of tokens and see how text is broken into tokens. Alternatively, if you'd like to tokenize text programmatically, use tiktoken as a fast BPE tokenizer specifically used for OpenAI models.

How does a tokenizer work?

A tokenizer can split the text string into a list of tokens, as stated in the official OpenAI example on counting tokens with tiktoken:

tiktoken is a fast open-source tokenizer by OpenAI.

Given a text string (e.g., "tiktoken is great!") and an encoding (e.g., "cl100k_base"), a tokenizer can split the text string into a list of tokens (e.g., ["t", "ik", "token", " is", " great", "!"]).

Splitting text strings into tokens is useful because GPT models see text in the form of tokens. Knowing how many tokens are in a text string can tell you:

whether the string is too long for a text model to process and

how much an OpenAI API call costs (as usage is priced by token).

Which encodings does OpenAI use for its models?

As of April 2024, tiktoken supports 2 encodings used by OpenAI models (source 1, source 2):

Encoding name	OpenAI models
`o200k_base`	• GPT-4o models (`gpt-4o`)
`cl100k_base`	• GPT-4 models (`gpt-4`) • GPT-3.5 Turbo models (`gpt-3.5-turbo`) • GPT Base models (`davinci-002`, `babbage-002`) • Embeddings models (`text-embedding-ada-002`, `text-embedding-3-large`, `text-embedding-3-small`) • Fine-tuned models (`ft:gpt-4`, `ft:gpt-3.5-turbo`, `ft:davinci-002`, `ft:babbage-002`)

Note: The p50k_base and r50k_base encodings were used for models that are deprecated as of April 2024.

What tokenizer libraries are out there?

Official OpenAI libraries:

Python: tiktoken

3rd-party libraries:

Python: GPT2TokenizerFast
Node.js: tiktoken, gpt4-tokenizer, gpt3-tokenizer, gpt-3-encoder
.NET / C#: tryAGI.Tiktoken, SharpToken, TiktokenSharp, GPT Tokenizer
Java: jtokkit, gpt2-tokenizer-java
PHP: GPT-3-Encoder-PHP

How do I use tiktoken?

Install or upgrade tiktoken: pip install --upgrade tiktoken
Write the code to count tokens, where you have two options.

OPTION 1: Search in the table above for the correct encoding for a given OpenAI model

If you run get_tokens_1.py, you'll get the following output:

9

get_tokens_1.py

import tiktoken

def num_tokens_from_string(string: str, encoding_name: str) -> int:
    encoding = tiktoken.get_encoding(encoding_name)
    num_tokens = len(encoding.encode(string))
    return num_tokens

print(num_tokens_from_string("Hello world, let's test tiktoken.", "cl100k_base"))

OPTION 2: Use tiktoken.encoding_for_model() to automatically load the correct encoding for a given OpenAI model

If you run get_tokens_2.py, you'll get the following output:

9

get_tokens_2.py

import tiktoken

def num_tokens_from_string(string: str, encoding_name: str) -> int:
    encoding = tiktoken.encoding_for_model(encoding_name)
    num_tokens = len(encoding.encode(string))
    return num_tokens

print(num_tokens_from_string("Hello world, let's test tiktoken.", "gpt-3.5-turbo"))

Note: If you take a careful look at the usage field in the OpenAI API response, you'll see that it reports 10 tokens used for an identical message. That's 1 token more than tiktoken. I still haven't figured out why. I tested this in the past. As @Jota mentioned in the comment below, there still seems to be a mismatch between the token usage reported by the OpenAI API response and tiktoken.

Upvotes: 116

Bhav Bhela

Reputation: 347

COUNTING INPUT TOKENS

If you want to count the tokens used by a Chat Completion API request, which has other metadata like role and name in addition to the raw prompt (content), then see the excerpts below from OpenAI's cookbook.

Source Code

def num_tokens_from_messages(messages, model="gpt-3.5-turbo-0613"):
    """Return the number of tokens used by a list of messages."""
    try:
        encoding = tiktoken.encoding_for_model(model)
    except KeyError:
        print("Warning: model not found. Using cl100k_base encoding.")
        encoding = tiktoken.get_encoding("cl100k_base")
    if model in {
        "gpt-3.5-turbo-0613",
        "gpt-3.5-turbo-16k-0613",
        "gpt-4-0314",
        "gpt-4-32k-0314",
        "gpt-4-0613",
        "gpt-4-32k-0613",
        }:
        tokens_per_message = 3
        tokens_per_name = 1
    elif model == "gpt-3.5-turbo-0301":
        tokens_per_message = 4  # every message follows <|start|>{role/name}\n{content}<|end|>\n
        tokens_per_name = -1  # if there's a name, the role is omitted
    elif "gpt-3.5-turbo" in model:
        print("Warning: gpt-3.5-turbo may update over time. Returning num tokens assuming gpt-3.5-turbo-0613.")
        return num_tokens_from_messages(messages, model="gpt-3.5-turbo-0613")
    elif "gpt-4" in model:
        print("Warning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613.")
        return num_tokens_from_messages(messages, model="gpt-4-0613")
    else:
        raise NotImplementedError(
            f"""num_tokens_from_messages() is not implemented for model {model}. See https://github.com/openai/openai-python/blob/main/chatml.md for information on how messages are converted to tokens."""
        )
    num_tokens = 0
    for message in messages:
        num_tokens += tokens_per_message
        for key, value in message.items():
            num_tokens += len(encoding.encode(value))
            if key == "name":
                num_tokens += tokens_per_name
    num_tokens += 3  # every reply is primed with <|start|>assistant<|message|>
    return num_tokens

Usage

# let's verify the function above matches the OpenAI API response

from openai import OpenAI
import os

client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY", "<your OpenAI API key if not set as env var>"))

example_messages = [
    {
        "role": "system",
        "content": "You are a helpful, pattern-following assistant that translates corporate jargon into plain English.",
    },
    {
        "role": "system",
        "name": "example_user",
        "content": "New synergies will help drive top-line growth.",
    },
    {
        "role": "system",
        "name": "example_assistant",
        "content": "Things working well together will increase revenue.",
    },
    {
        "role": "system",
        "name": "example_user",
        "content": "Let's circle back when we have more bandwidth to touch base on opportunities for increased leverage.",
    },
    {
        "role": "system",
        "name": "example_assistant",
        "content": "Let's talk later when we're less busy about how to do better.",
    },
    {
        "role": "user",
        "content": "This late pivot means we don't have time to boil the ocean for the client deliverable.",
    },
]

for model in [
    "gpt-3.5-turbo-0301",
    "gpt-3.5-turbo-0613",
    "gpt-3.5-turbo",
    "gpt-4-0314",
    "gpt-4-0613",
    "gpt-4",
    ]:
    print(model)
    # example token count from the function defined above
    print(f"{num_tokens_from_messages(example_messages, model)} prompt tokens counted by num_tokens_from_messages().")
    # example token count from the OpenAI API
    response = client.chat.completions.create(model=model,
    messages=example_messages,
    temperature=0,
    max_tokens=1)
    print(f'{response.usage.prompt_tokens} prompt tokens counted by the OpenAI API.')
    print()

Output

127 prompt tokens counted by num_tokens_from_messages().
127 prompt tokens counted by the OpenAI API.

gpt-3.5-turbo-0613
129 prompt tokens counted by num_tokens_from_messages().
129 prompt tokens counted by the OpenAI API.

gpt-3.5-turbo
Warning: gpt-3.5-turbo may update over time. Returning num tokens assuming gpt-3.5-turbo-0613.
129 prompt tokens counted by num_tokens_from_messages().
129 prompt tokens counted by the OpenAI API.

gpt-4-0314
129 prompt tokens counted by num_tokens_from_messages().
129 prompt tokens counted by the OpenAI API.

gpt-4-0613
129 prompt tokens counted by num_tokens_from_messages().
129 prompt tokens counted by the OpenAI API.

gpt-4
Warning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613.
129 prompt tokens counted by num_tokens_from_messages().
129 prompt tokens counted by the OpenAI API.

Important note from the Cookbook:

Note that the exact way that tokens are counted from messages may change from model to model. Consider the counts from the function below an estimate, not a timeless guarantee.

SETTING MAX TOKENS

In the Chat Completion API request, max_tokens represents the maximum tokens for the generated output. To simplify the process of setting max_tokens, you can make a function:

def max_tokens(messages, model):
    input_tokens = num_tokens_from_messages(messages, model=model)
    context_length = get_context_length(model)
    return context_length - input_tokens

def get_context_length(model):
    if model == "gpt-3.5-turbo-0613":
        return 4096
    # Add additional model context windows here.
    else:
        raise ValueError(f"No context length known for model: {model}")

REFERENCES

OpenAI Cookbook: https://cookbook.openai.com/examples/how_to_count_tokens_with_tiktoken#6-counting-tokens-for-chat-completions-api-calls
OpenAI API Reference: https://platform.openai.com/docs/api-reference/chat/create#chat-create-max_tokens
Community Forum: https://community.openai.com/t/how-the-max-tokens-are-considered/313514
OpenAI Model Documentation for Context Windows: https://platform.openai.com/docs/models/gpt-3-5-turbo

Upvotes: 4

Timothy Alexis Vass

Reputation: 2715

Here is how I do it with Python 3. Then you can pass the model name or the encoding string. You can get the encoding, the tokens or the token count.

token_helper.py:

import tiktoken

def encoding_getter(encoding_type: str):
    """
    Returns the appropriate encoding based on the given encoding type (either an encoding string or a model name).
    """
    if "k_base" in encoding_type:
        return tiktoken.get_encoding(encoding_type)
    else:
        return tiktoken.encoding_for_model(encoding_type)

def tokenizer(string: str, encoding_type: str) -> list:
    """
    Returns the tokens in a text string using the specified encoding.
    """
    encoding = encoding_getter(encoding_type)
    tokens = encoding.encode(string)
    return tokens

def token_counter(string: str, encoding_type: str) -> int:
    """
    Returns the number of tokens in a text string using the specified encoding.
    """
    num_tokens = len(tokenizer(string, encoding_type))
    return num_tokens

Works like this

>>> import token_helper
>>> token_helper.token_counter("This string will be counted as tokens", "gpt-3.5-turbo"))
7

Upvotes: 3

Arturo

Reputation: 21

With the information contained in the comments, I made this: https://gist.github.com/buanzo/7cdd2c34fc0bb25c71b857a16853c6fa

It is a count_tokens implementation that tries tiktoken, nltk and fallbacks to .split()

It includes a simple TokenBuffer implementation as well.

We can import the count_tokens function from the token_counter module and call it with our text string as follows:

from token_counter import count_tokens
text = "The quick brown fox jumps over the lazy dog."
result = count_tokens(text, debug=True)
print(result)

If all the required libraries are available the result is better but even without tiktoken nor nltk, the function should return a dictionary with the number of tokens and the method used to count them. For example:

{'n_tokens': 9, 'method': 'tiktoken'}

Upvotes: 1