How to calculate the token of the entire ChatGPT conversation?

Question

I am writing a function that talks to ChatGPT and outputs it as stream, but I found that ChatGPT does not seem to provide the token used when I use stream output.

This is the function of streaming output:

/**
 * talk with ChatGPT
 * @param msg
 * @param callback
 * @example chatWithGPTWithSteaming([{role: "system", "content": "You are a helpful assistant."},{role: "system", content: "Hello world"}],(text)=>{console.log(text)})
 */
export async function chatWithGPTWithSteaming(msg: any,callback:Function) {
    const chatCompletion = await openai.createChatCompletion({
        model: 'gpt-3.5-turbo',
        messages: msg,
        stream: true,
    }, {responseType: "stream"});

    chatCompletion.data.on('data', data => {
        const lines = data.toString().split('
').filter(line => line.trim() !== '');
        for (const line of lines) {
            const message = line.replace(/^data: /, '');
            if (message === '[DONE]') {
                console.log("text is end");
                console.log(chatCompletion);
                // callback(false);
                return; // Stream finished
            }
            try {
                const parsed = JSON.parse(message);
                const text = parsed.choices[0].delta.content;
                data += text;
                if (text) {
                    console.log(text);
                    callback(text);
                }
            } catch (error) {
                console.error('Could not JSON parse stream message', message, error);
            }
        }
    });
    console.log(chatCompletion);
}

This is an example of the value of data:

But when I don't use streaming output:

export async function chatWithGPT(msg: any,a) {
    const completion = await openai.createChatCompletion({
        model: "gpt-3.5-turbo",
        messages: [
            {role: "system", "content": "You are a helpful assistant."},
            {role: "user", content: "Hello!"},
        ],
    });
    console.log(completion.data.choices[0].message);
}

At this point, I can obtain usage.total from it token

So how should I obtain a token while using streaming output?

I noticed that the Tokenizerprovided by OpenAI can be used owever, the token value I calculated using the Tokenizer is different from the value returned by the ChatGPT API.

When I use this conversation, the API returns prompt_token is 19

{role: "system", "content": "You are a helpful assistant."},
{role: "user", content: "Hello!"},

But the token provided by the Tokenizer is 9.

Any help would be greatly appreciated.

FengZi · Accepted Answer

I got the answer from this code of Python.

This is the code of Javascript, and the dependencies it needs are GPT-3-Encoder.

const { encode } = require('gpt-3-encoder');

function numTokensFromMessages(messages, model = "gpt-3.5-turbo-0613") {
    let tokens_per_message = 0;
    let tokens_per_name = 0;
    if (["gpt-3.5-turbo-0613", "gpt-3.5-turbo-16k-0613", "gpt-4-0314", "gpt-4-32k-0314", "gpt-4-0613", "gpt-4-32k-0613"].includes(model)) {
        tokens_per_message = 3;
        tokens_per_name = 1;
    } else if (model == "gpt-3.5-turbo-0301") {
        tokens_per_message = 4;
        tokens_per_name = -1;
    } else if (model.includes("gpt-3.5-turbo")) {
        console.log("Warning: gpt-3.5-turbo may update over time. Returning num tokens assuming gpt-3.5-turbo-0613.");
        return numTokensFromMessages(messages, "gpt-3.5-turbo-0613");
    } else if (model.includes("gpt-4")) {
        console.log("Warning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613.");
        return numTokensFromMessages(messages, "gpt-4-0613");
    } else {
        throw new Error(`num_tokens_from_messages() is not implemented for model ${model}. See https://github.com/openai/openai-python/blob/main/chatml.md for information on how messages are converted to tokens.`);
    }
    let num_tokens = 0;
    for (let i = 0; i < messages.length; i++) {
        let message = messages[i];
        num_tokens += tokens_per_message;
        for (let key in message) {
            let value = message[key];
            num_tokens += encode(value).length;
            if (key == "name") {
                num_tokens += tokens_per_name;
            }
        }
    }
    num_tokens += 3;
    return num_tokens;
}

// usage:
const testToken = numTokensFromMessages([
    { role: "system", "content": "You are a helpful assistant." },
    { role: "user", content: "Hello!" },
    { role: "assistant", content: "What can I help you with today?" },
    { role: "user", content: "I'd like to book a hotel in Berlin." },
]);

console.log(testToken);

How to calculate the token of the entire ChatGPT conversation?

Answers (1)

Related Questions