Reputation: 11
Should I subtract the number of cached input tokens from the number of input tokens when I receive the 'Usage' object back in the OpenAI response?
I'm trying to calculate the total cost of the response for the following values:
Given the following OpenAI pricing:
I tried to use their official website to find the solution but could not.
I was expecting the number of input tokens to be lower (i.e. the value returned to have already automatically subtracted the number of cached input tokens, so result would be 180). I am not sure if I should subtract the cached tokens or not when calculating everything.
Upvotes: 1
Views: 131
Reputation: 3710
Given the values you've provided, the costs are as follows:
Cost per input token $ 0.00000015
Cost per cached token $ 0.00000007
Cost per output token $ 0.00000060
Num. input tokens 1204
Num. cached tokens 1024
Num. non-cached tokens 180
Num. output tokens 12
Cost (non-cached input) $ 0.00002700
Cost (cached input) $ 0.00007680
Cost (input) $ 0.00010380
Cost (output) $ 0.00000720
Cost (total) $ 0.00011100
Calculated using the following:
cost_per_input_tok = 0.15 / 1000000
cost_per_cached_tok = 0.075 / 1000000
cost_per_output_tok = 0.6 / 1000000
num_input_toks = 1204
num_cached_toks = 1024
num_output_toks = 12
num_non_cached_toks = num_input_toks - num_cached_toks
non_cached_cost = num_non_cached_toks * cost_per_input_tok
cached_cost = num_cached_toks * cost_per_cached_tok
input_cost = non_cached_cost + cached_cost
output_cost = num_output_toks * cost_per_output_tok
total_cost = input_cost + output_cost
print("{:<25}{:>2} {:<10.8f}".format("Cost per input token", "$", cost_per_input_tok))
print("{:<25}{:>2} {:<10.8f}".format("Cost per cached token", "$", cost_per_cached_tok))
print("{:<25}{:>2} {:<10.8f}".format("Cost per output token", "$", cost_per_output_tok))
print("{:<25}{:>2} {:<10}".format("Num. input tokens", "", num_input_toks))
print("{:<25}{:>2} {:<10}".format("Num. cached tokens", "", num_cached_toks))
print("{:<25}{:>2} {:<10}".format("Num. non-cached tokens", "", num_non_cached_toks))
print("{:<25}{:>2} {:<10}".format("Num. output tokens", "", num_output_toks))
print("{:<25}{:>2} {:<10.8f}".format("Cost (non-cached input)", "$", non_cached_cost))
print("{:<25}{:>2} {:<10.8f}".format("Cost (cached input)", "$", cached_cost))
print("{:<25}{:>2} {:<10.8f}".format("Cost (input)", "$", input_cost))
print("{:<25}{:>2} {:<10.8f}".format("Cost (output)", "$", output_cost))
print("{:<25}{:>2} {:<10.8f}".format("Cost (total)", "$", total_cost))
Some definitions:
Some information on how input caching works:
References
Prompt Caching in the API (2024) OpenAI. Available at: https://openai.com/index/api-prompt-caching (Accessed: 20 February 2025).
Upvotes: 0