James
James

Reputation: 12202

Why does increase() return a value of 1.33 in prometheus?

We graph a timeseries with sum(increase(foo_requests_total[1m])) to show the number of foo requests per minute. Requests come in quite sporadically - just a couple of requests per day. The value that is shown in the graph is always 1.3333. Why is the value not 1? There was one request during this minute.

enter image description here

Upvotes: 38

Views: 18260

Answers (2)

valyala
valyala

Reputation: 18084

Prometheus calculates increase(foo_requests_total[1m]) at a timestamp t in the following way:

  1. It selects all the raw samples per each time series with foo_requests_total name on the time range (t-1m ... t]. Note that samples at the timestamp t-1m aren't included in the selection, while samples at the timestamp t are included in the selection.
  2. It calculates the difference d between the last and the first raw sample on the selected time range (Prometheus may also remove possible counter resets, but let's skip this step for the sake of clarity).
  3. It extrapolates the calculated difference d if the first and/or the last raw sample are located too far from the bounds of the selected time range.

The last step may result in fractional increase() values over integer counters as seen in the original question. See this issue for more details. Note also that increase() in Prometheus misses the difference between the first raw sample on the selected time range and the previous sample before the selected time range. This may result in smaller than expected increase() results.

Prometheus developers are going to fix these issues - see this design doc. In the mean time try VictoriaMetrics - its increase() function properly returns the expected integer result without any extrapolation over integer counters.

Upvotes: 8

brian-brazil
brian-brazil

Reputation: 34172

The challenge with calculating this number is that we only have a few data points inside a time range, and they tend not to be at the exact start and end of that time range (1 minute here). What do we do about the time between the start of the time range and the first data point, similarly the last data point and the end of the range?

We do a small bit of extrapolation to smooth this out and produce the correct result in aggregate. For very slow moving counters like this it can cause artifacts.

Upvotes: 21

Related Questions