Reputation: 16754
Quote from Prometheus Count and sum of observations doc:
To calculate the average request duration during the last 5 minutes from a histogram or summary called http_request_duration_seconds, use the following expression:
rate(http_request_duration_seconds_sum[5m]) / rate(http_request_duration_seconds_count[5m])
I should mention that I understand:
rate
function doesHowever I'm not interested in the increase rate of request duration, but rather in the request duration itself!
Can someone explain why everybody while looking for average count/value in any given moment of time has to use a rate
function, when it doesn't provide that?
P.S. there's seemingly a duplicate question with a checked answer, however all answers in it explain what rate
function is, how it does what it does, etc. I already understand what rate
function does. I just don't understand why we are supposed to use it in first place, especially when the result that it provides has nothing to do with what we're looking for.
Upvotes: 7
Views: 6200
Reputation: 4819
Let's show that the formula quoted from the Prometheus manual, making use of the function named rate()
, computes the exact value you are looking for.
According to the way a counter works, we know that each time the counter named http_request_duration_seconds_sum
takes into account a new value, that is the sum of durations of all the requests that happened from the last time, it adds this sum to its previous value.
Therefore, rate(http_request_duration_seconds_sum[5m])
is the sum of the durations of the requests that occurred during 5 minutes, divided by 5 minutes.
And each time the counter http_request_duration_seconds_count
takes into account a new value, that is the number of requests that happened from the last time, this counter adds this number of requests to its previous value.
Therefore, rate(http_request_duration_seconds_count[5m])
is the number of requests that occurred during 5 minutes, divided by 5 minutes.
So, let's inject the formulas discovered in the two previous paragraphs into the following fraction:
equals to:
You can simplify this formula by removing 5 minutes
, because it is present in the numerator and in the denominator.
Finally, the following formula:
is equal to the following one:
The second part of this equality is the value you want to compute: the average duration of requests during 5 minutes. This is why it is computed using the first part of this equality.
Upvotes: 11
Reputation: 18094
Prometheus summary and histogram metric types expose two additional counters:
_count
suffix to the original metric name. For example, if the original historgam
or summary
metric name is http_request_duration_seconds
(see docs for metric naming convention in Prometheus), then the the http_request_duration_seconds_count
counter is generated, which counts the total number of http requests since the service start._sum
suffix to the original metric name. For example, if the original metric name is http_request_duration_seconds
, then the http_request_duration_seconds_sum
counter contains the total sum of all the http request durations since the service start.How to calculate the average request duration from these two metrics? An obvious solution is to divide sum of all the request durations by the number of requests:
http_request_duration_seconds_sum / http_request_duration_seconds_total
But this solution shows the average request duration since the last restart of the service. Usually users are interested in an average request duration over some lookbehind interval. For example, over the last 5 minutes. Then we need do divide the sum of all the request durations during the last 5 minutes by the number of requests served during the last 5 minutes. This can be done with increase function:
increase(http_request_duration_seconds_sum[5m])
/
increase(http_request_duration_seconds_count[5m])
The rate function in Prometheus is calculated as rate(m[d]) = increase(m[d])/d
, e.g. this is increase()
divided by the lookbehind window d
. Let's substitute increase
with rate
in the query above:
rate(http_request_duration_seconds_sum[5m])
/
rate(http_request_duration_seconds_count[5m])
Now let's substitute rate(m[d])
with increase(m[d])/d
according to the formula above:
(increase(http_request_duration_seconds_sum[5m])/5m)
/
(increase(http_request_duration_seconds_count[5m])/5m)
The 5m
denominators can be collapsed, so we end up with the initial query with increase()
. So it is OK to use either rate
or increase
in the query above - this shouldn't change the result.
Upvotes: 8