James
James

Reputation: 12202

Graphing slow counters with prometheus and grafana

We graph fast counters with sum(rate(my_counter_total[1m])) or with sum(irate(my_counter_total[20s])). Where the second one is preferrable if you can always expect changes within the last couple of seconds.

But how do you graph slow counters where you only have some increments every couple of minutes or even hours? Having values like 0.0013232/s is not very human friendly.

Let's say I want to graph how many users sign up to our service (we expect a couple of signups per hour). What's a reasonable query?

We currently use the following to graph that in grafana:

Slow counter setup

Is this reasonable?

I'm still trying to understand how all those parameters play together to draw a graph. Can someone explain how the range selector ([10m]), the rate() and the irate() functions, the Step and Resolution settings in grafana influence each other?

Upvotes: 20

Views: 13189

Answers (2)

valyala
valyala

Reputation: 18084

The 3600 * sum(rate(signup_total[1h])) can be substituted with sum(increase(signup_total[1h])) . The increase(counter[d]) function returns counter increase on the given lookbehind window d. E.g. increase(signup_total[1h]) returns the number of signups during the last hour.

Note that the returned value from increase(signup_total[1h]) may be fractional even if signup_total contains only integer values. This is because of extrapolation - see this issue for technical details. There are the following solutions for this issue:

  • To use offset modifier: signup_total - (signup_total offset 1h) . This query returns correct results if signup_total wasn't reset to zero during the last hour. In this case the sum(signup_total - (signup_total offset 1h)) is roughly equivalent to sum(increase(signup_total[1h])), but returns more accurate integer results.
  • To use VictoriaMetrics. It returns the expected integer results from increase() out of the box. See this article and this comment for technical details.

Upvotes: 0

brian-brazil
brian-brazil

Reputation: 34172

That's a correct way to do it. You can also use increase() which is syntactic sugar for using rate() that way.

Can someone explain how the range selector

This is only used by Prometheus, and indicates what data to work over.

the Step and Resolution settings in grafana influence each other?

This is used on the Grafana side, it affects how many time slices it'll request from Prometheus.

These settings do not directly influence each other. However the resolution should work out to be smaller than the range, or you'll be undersampling and miss information.

Upvotes: 9

Related Questions