Should I subtract the number of cached input tokens from the number of input tokens when I receive the 'Usage' object back in the OpenAI response? I'm trying to calculate the total cost of the response for the following values: Input : 1204 Cached input : 1024 Output : 12 Given the following OpenAI pricing: Input : $0.150 / 1M tokens Cached input : $0.075 / 1M tokens Output : $0.600 / 1M tokens I tried to use their official website to find the solution but could not. I was expecting the number of input tokens to be lower (i.e. the value returned to have already automatically subtracted the number of cached input tokens, so result would be 180). I am not sure if I should subtract the cached tokens or not when calculating everything.

Reputation: 11

Calculating total cost of OpenAI reponses with cached input tokens

Should I subtract the number of cached input tokens from the number of input tokens when I receive the 'Usage' object back in the OpenAI response?

I'm trying to calculate the total cost of the response for the following values:

Input: 1204
Cached input: 1024
Output: 12

Given the following OpenAI pricing:

Input: $0.150 / 1M tokens
Cached input: $0.075 / 1M tokens
Output: $0.600 / 1M tokens

I tried to use their official website to find the solution but could not.

I was expecting the number of input tokens to be lower (i.e. the value returned to have already automatically subtracted the number of cached input tokens, so result would be 180). I am not sure if I should subtract the cached tokens or not when calculating everything.

Upvotes: 1

Answers (1)

Kyle F. Hartzenberg

Reputation: 3710

Given the values you've provided, the costs are as follows:

Cost per input token      $ 0.00000015
Cost per cached token     $ 0.00000007
Cost per output token     $ 0.00000060
Num. input tokens           1204
Num. cached tokens          1024
Num. non-cached tokens      180
Num. output tokens          12
Cost (non-cached input)   $ 0.00002700
Cost (cached input)       $ 0.00007680
Cost (input)              $ 0.00010380
Cost (output)             $ 0.00000720
Cost (total)              $ 0.00011100

Calculated using the following:

cost_per_input_tok = 0.15 / 1000000
cost_per_cached_tok = 0.075 / 1000000
cost_per_output_tok = 0.6 / 1000000
num_input_toks = 1204
num_cached_toks = 1024
num_output_toks = 12

num_non_cached_toks = num_input_toks - num_cached_toks
non_cached_cost = num_non_cached_toks * cost_per_input_tok
cached_cost = num_cached_toks * cost_per_cached_tok
input_cost = non_cached_cost + cached_cost
output_cost = num_output_toks * cost_per_output_tok
total_cost = input_cost + output_cost

print("{:<25}{:>2} {:<10.8f}".format("Cost per input token", "$", cost_per_input_tok))
print("{:<25}{:>2} {:<10.8f}".format("Cost per cached token", "$", cost_per_cached_tok))
print("{:<25}{:>2} {:<10.8f}".format("Cost per output token", "$", cost_per_output_tok))
print("{:<25}{:>2} {:<10}".format("Num. input tokens", "", num_input_toks))
print("{:<25}{:>2} {:<10}".format("Num. cached tokens", "", num_cached_toks))
print("{:<25}{:>2} {:<10}".format("Num. non-cached tokens", "", num_non_cached_toks))
print("{:<25}{:>2} {:<10}".format("Num. output tokens", "", num_output_toks))
print("{:<25}{:>2} {:<10.8f}".format("Cost (non-cached input)", "$", non_cached_cost))
print("{:<25}{:>2} {:<10.8f}".format("Cost (cached input)", "$", cached_cost))
print("{:<25}{:>2} {:<10.8f}".format("Cost (input)", "$", input_cost))
print("{:<25}{:>2} {:<10.8f}".format("Cost (output)", "$", output_cost))
print("{:<25}{:>2} {:<10.8f}".format("Cost (total)", "$", total_cost))

Some definitions:

Input. The number of tokens input to the model (i.e. the total number of tokens in your prompt).
Cached input. The number of cached tokens used when processing the input.
Output. The tokens generated by the model in response to the input (i.e. the total number of tokens in the completion).

Some information on how input caching works:

The input must be at least 1,024 tokens long to benefit from input caching.
The API caches the longest prefix of an input that has been previously computed, starting at 1,024 tokens and increasing in 128-token increments (i.e. 1024, 1152, 1280, 1408 etc.).
If common prefixes to inputs are used, the input caching discount will be automatically applied.
Caches are typically cleared after 5-10 minutes of inactivity and are always removed within one hour of the cache's last use.

References

Prompt Caching in the API (2024) OpenAI. Available at: https://openai.com/index/api-prompt-caching (Accessed: 20 February 2025).

Upvotes: 0

Calculating total cost of OpenAI reponses with cached input tokens

Answers (1)

Related Questions