Reputation: 127
I know that their API returns usage
onFinish
, but I want to count the tokens myself.
I am trying to count tokens for gpt-4o-2024-05-13
, which I can tokenize using https://www.npmjs.com/package/gpt-tokenizer
However, the problem that I am running into is that there is a wildly big difference between what I am able to count as the input and what Vercel reports (OpenAI logs match Vercel reporting, so I know it is accurate).
const { fullStream } = await streamText({
abortSignal: signal,
maxSteps: 20,
messages: truncatedMessages,
model: createModel(llmModel.nid),
tools: await createTools({
chatSessionMessageId,
}),
});
for await (const chunk of fullStream) {
// ...
}
so assuming that this is how I am sending messages to the LLM, and that I am streaming the response, and that I have a function tokenize(subject: string): string[]
, what's the correct way to calculate the tokens used by the prompt and completion?
For context, what I've tried was something like:
for await (const chunk of fullStream) {
content += chunk.textDelta;
}
tokenize(content).length
I would expect that this gives accurate completion_tokens
, but the Vercel reported number is almost 40% higher.
I tried this to count input:
truncatedMessages
.map((message) => {
return message.content;
})
.join('\n'),
but that's also a lot less than what Vercel/OpenAI reports.
Where do the extra tokens come from?
Upvotes: 0
Views: 207