Reputation: 29149
I'm running Prometheus in a kubernetes cluster. All is running find and my UI pods are counting visitors.
Please ignore the title, what you see here is the query at the bottom of the image. It's a counter. The gaps in the graph are due to pods restarting. I have two pods running simultaneously!
Now suppose I would like to count the total of visitors, so I need to sum over all the pods
This is what I expect considering the first image, right?
However, I don't want the graph to drop when a pod restarts. I would like to have something cumulative over a specified amount of time (somehow ignoring pods restarting). Hope this makes any sense. Any suggestions?
UPDATE
Below is suggested to do the following
Its a bit hard to see because I've plotted everything there, but the suggested answer sum(rate(NumberOfVisitors[1h])) * 3600
is the continues green line there. What I don't understand now is the value of 3 it has? Also why does the value increase after 21:55, because I can see some values before that.
As the approach seems to be ok, I noticed that the actual increase is actually 3, going from 1 to 4. In the graph below I've used just one time series to reduce noise
Upvotes: 6
Views: 6955
Reputation: 17890
Prometheus doesn't provide the ability to sum counters, which may be reset. Additionally, the increase() function in Prometheus has some issues, which may prevent from using it for querying counter increase over the specified time range:
increase(NumberOfVisitors[1m])
at timestamp t
may miss the counter increase between the last raw sample just before the t-1m
time and the first raw sample at (t-1m ... t]
time range. See more details here and here.NumberOfVisitors
counter is increased to 10 just before the first scrape of this counter by Prometheus, then increase()
over the time range with the first sample would under-count the counter increase by 10.Prometheus developers are going to fix these issues - see this design doc. In the mean time it is possible to use VictoriaMetrics - its' increase()
function is free from these issues.
Returning to the original question - the sum of multiple counters, which may be reset, can be returned with the following MetricsQL query in VictoriaMetrics:
running_sum(sum(increase(NumberOfVisitor)))
It uses the following functions:
Upvotes: 1
Reputation: 54221
Rate, then sum, then multiply by the time range in seconds. That will handle rollovers on counters too.
Upvotes: 4