Reputation: 91
I am writing a function that talks to ChatGPT and outputs it as stream, but I found that ChatGPT does not seem to provide the token
used when I use stream output.
This is the function of streaming output:
/**
* talk with ChatGPT
* @param msg
* @param callback
* @example chatWithGPTWithSteaming([{role: "system", "content": "You are a helpful assistant."},{role: "system", content: "Hello world"}],(text)=>{console.log(text)})
*/
export async function chatWithGPTWithSteaming(msg: any,callback:Function) {
const chatCompletion = await openai.createChatCompletion({
model: 'gpt-3.5-turbo',
messages: msg,
stream: true,
}, {responseType: "stream"});
chatCompletion.data.on('data', data => {
const lines = data.toString().split('\n').filter(line => line.trim() !== '');
for (const line of lines) {
const message = line.replace(/^data: /, '');
if (message === '[DONE]') {
console.log("text is end");
console.log(chatCompletion);
// callback(false);
return; // Stream finished
}
try {
const parsed = JSON.parse(message);
const text = parsed.choices[0].delta.content;
data += text;
if (text) {
console.log(text);
callback(text);
}
} catch (error) {
console.error('Could not JSON parse stream message', message, error);
}
}
});
console.log(chatCompletion);
}
This is an example of the value of data
:
But when I don't use streaming output:
export async function chatWithGPT(msg: any,a) {
const completion = await openai.createChatCompletion({
model: "gpt-3.5-turbo",
messages: [
{role: "system", "content": "You are a helpful assistant."},
{role: "user", content: "Hello!"},
],
});
console.log(completion.data.choices[0].message);
}
At this point, I can obtain usage.total
from it token
So how should I obtain a token
while using streaming output?
I noticed that the Tokenizerprovided by OpenAI can be used owever, the token
value I calculated using the Tokenizer is different from the value returned by the ChatGPT API.
When I use this conversation, the API returns prompt_token
is 19
{role: "system", "content": "You are a helpful assistant."},
{role: "user", content: "Hello!"},
But the token
provided by the Tokenizer is 9.
Any help would be greatly appreciated.
Upvotes: 1
Views: 3665
Reputation: 91
I got the answer from this code of Python.
This is the code of Javascript, and the dependencies it needs are GPT-3-Encoder.
const { encode } = require('gpt-3-encoder');
function numTokensFromMessages(messages, model = "gpt-3.5-turbo-0613") {
let tokens_per_message = 0;
let tokens_per_name = 0;
if (["gpt-3.5-turbo-0613", "gpt-3.5-turbo-16k-0613", "gpt-4-0314", "gpt-4-32k-0314", "gpt-4-0613", "gpt-4-32k-0613"].includes(model)) {
tokens_per_message = 3;
tokens_per_name = 1;
} else if (model == "gpt-3.5-turbo-0301") {
tokens_per_message = 4;
tokens_per_name = -1;
} else if (model.includes("gpt-3.5-turbo")) {
console.log("Warning: gpt-3.5-turbo may update over time. Returning num tokens assuming gpt-3.5-turbo-0613.");
return numTokensFromMessages(messages, "gpt-3.5-turbo-0613");
} else if (model.includes("gpt-4")) {
console.log("Warning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613.");
return numTokensFromMessages(messages, "gpt-4-0613");
} else {
throw new Error(`num_tokens_from_messages() is not implemented for model ${model}. See https://github.com/openai/openai-python/blob/main/chatml.md for information on how messages are converted to tokens.`);
}
let num_tokens = 0;
for (let i = 0; i < messages.length; i++) {
let message = messages[i];
num_tokens += tokens_per_message;
for (let key in message) {
let value = message[key];
num_tokens += encode(value).length;
if (key == "name") {
num_tokens += tokens_per_name;
}
}
}
num_tokens += 3;
return num_tokens;
}
// usage:
const testToken = numTokensFromMessages([
{ role: "system", "content": "You are a helpful assistant." },
{ role: "user", content: "Hello!" },
{ role: "assistant", content: "What can I help you with today?" },
{ role: "user", content: "I'd like to book a hotel in Berlin." },
]);
console.log(testToken);
Upvotes: 2