dmm-l-mediehus
dmm-l-mediehus

Reputation: 11

Calculating total cost of OpenAI reponses with cached input tokens

Should I subtract the number of cached input tokens from the number of input tokens when I receive the 'Usage' object back in the OpenAI response?

I'm trying to calculate the total cost of the response for the following values:

Given the following OpenAI pricing:

I tried to use their official website to find the solution but could not.

I was expecting the number of input tokens to be lower (i.e. the value returned to have already automatically subtracted the number of cached input tokens, so result would be 180). I am not sure if I should subtract the cached tokens or not when calculating everything.

Upvotes: 1

Views: 131

Answers (1)

Kyle F. Hartzenberg
Kyle F. Hartzenberg

Reputation: 3710

Given the values you've provided, the costs are as follows:

Cost per input token      $ 0.00000015
Cost per cached token     $ 0.00000007
Cost per output token     $ 0.00000060
Num. input tokens           1204
Num. cached tokens          1024
Num. non-cached tokens      180
Num. output tokens          12
Cost (non-cached input)   $ 0.00002700
Cost (cached input)       $ 0.00007680
Cost (input)              $ 0.00010380
Cost (output)             $ 0.00000720
Cost (total)              $ 0.00011100

Calculated using the following:

cost_per_input_tok = 0.15 / 1000000
cost_per_cached_tok = 0.075 / 1000000
cost_per_output_tok = 0.6 / 1000000
num_input_toks = 1204
num_cached_toks = 1024
num_output_toks = 12

num_non_cached_toks = num_input_toks - num_cached_toks
non_cached_cost = num_non_cached_toks * cost_per_input_tok
cached_cost = num_cached_toks * cost_per_cached_tok
input_cost = non_cached_cost + cached_cost
output_cost = num_output_toks * cost_per_output_tok
total_cost = input_cost + output_cost

print("{:<25}{:>2} {:<10.8f}".format("Cost per input token", "$", cost_per_input_tok))
print("{:<25}{:>2} {:<10.8f}".format("Cost per cached token", "$", cost_per_cached_tok))
print("{:<25}{:>2} {:<10.8f}".format("Cost per output token", "$", cost_per_output_tok))
print("{:<25}{:>2} {:<10}".format("Num. input tokens", "", num_input_toks))
print("{:<25}{:>2} {:<10}".format("Num. cached tokens", "", num_cached_toks))
print("{:<25}{:>2} {:<10}".format("Num. non-cached tokens", "", num_non_cached_toks))
print("{:<25}{:>2} {:<10}".format("Num. output tokens", "", num_output_toks))
print("{:<25}{:>2} {:<10.8f}".format("Cost (non-cached input)", "$", non_cached_cost))
print("{:<25}{:>2} {:<10.8f}".format("Cost (cached input)", "$", cached_cost))
print("{:<25}{:>2} {:<10.8f}".format("Cost (input)", "$", input_cost))
print("{:<25}{:>2} {:<10.8f}".format("Cost (output)", "$", output_cost))
print("{:<25}{:>2} {:<10.8f}".format("Cost (total)", "$", total_cost))

Some definitions:

  • Input. The number of tokens input to the model (i.e. the total number of tokens in your prompt).
  • Cached input. The number of cached tokens used when processing the input.
  • Output. The tokens generated by the model in response to the input (i.e. the total number of tokens in the completion).

Some information on how input caching works:

  • The input must be at least 1,024 tokens long to benefit from input caching.
  • The API caches the longest prefix of an input that has been previously computed, starting at 1,024 tokens and increasing in 128-token increments (i.e. 1024, 1152, 1280, 1408 etc.).
  • If common prefixes to inputs are used, the input caching discount will be automatically applied.
  • Caches are typically cleared after 5-10 minutes of inactivity and are always removed within one hour of the cache's last use.

References

Prompt Caching in the API (2024) OpenAI. Available at: https://openai.com/index/api-prompt-caching (Accessed: 20 February 2025).

Upvotes: 0

Related Questions