Amir Bar
Amir Bar

Reputation: 3105

Prometheus how to handle counters on server

I have articles and for each article I want to have read count

# TYPE news_read_counter2 Counter
news_read_counter2{id="2000"} 168

now the counters on the servers are saved in redis\memcached so they can get reset from time to time so after a while the redis machine is restart and the server dont have the last news_read_counter number and if I start from zero again

# TYPE news_read_counter2 Counter
news_read_counter2{id="2000"} 2

now looking at the news_read_counter2{id="2000"} graph I see that the counter is getting dropped to 2 while the docs says:

A counter is a cumulative metric that represents a single numerical value that only ever goes up.

so now to keep track of the news_read_counter I need to save the data into db and I back to the start zone where I need to use mysql to handle my data

here an Image of counter after redis got restart: enter image description here

Upvotes: 23

Views: 39979

Answers (3)

valyala
valyala

Reputation: 17784

It is OK if counter is reset to zero on service restart, since Prometheus provides increase and rate functions, which remove counter resets before performing actual calculations. Usually Prometheus counters must be wrapped into these functions in order to get meaningful results. For example:

  • increase(news_read_counter2[24h]) returns the number of news reads for the last 24 hours
  • rate(news_read_counter2[1h]) returns the average per-second news read rate for the last hour

If you need obtaining an absolute counter value after counter resets' removal, then this can be done with increase(news_read_counter2[10y]). This query returns the total number of news reads for the last 10 years. Prometheus calculates the specified query independently per each point displayed on the graph. So the query would display non-decreasing graph with an absolute number of news reads since the first new read for the last 10 years. Note that the increase() query with too big lookbehind window in square brackets may work slowly, since it needs to process all the raw samples stored in Prometheus for time series with news_read_counter2 name.

Note that increase() function in Prometheus has some issues:

  • It may return fractional results over integer counters because of extrapolation. See this issue for details.
  • It misses potential counter increase between the last raw sample before the lookbehind window in square brackets and the first raw sample inside the lookbehind window.
  • It misses the initial counter increase if time series starts from non-zero sample.

These issues should be fixed eventually according to this design doc. In the mean time you can try VictoriaMetrics - Prometheus-like monitoring system I work on. It supports PromQL-like query language - MetricsQL with increase() function, which is free from issues mentioned above.

P.S. If you need drawing non-increasing graph, which starts from zero at the left side and shows cumulative counter increase on any selected time range, then Prometheus cannot help with this case :( But VictoriaMetrics can help. For example, the following MetricsQL query returns cumulative counter increase on any selected time range:

running_sum(increase(news_read_counter2))

The query uses running_sum function.

The query also uses VictoriaMetrics feature, which allows skipping lookbehind window in square brackets for increase() function (and any other rollup functions). In this case it automatically uses the interval between points on the graph (aka step) as lookbehind window, so all the raw samples are taken into account by the query.

Upvotes: 10

aggregate1166877
aggregate1166877

Reputation: 3150

You generally don't want to look at the total of a counter the way that you are in your example, because it's not very meaningful once you actually try to use it analytically.

The idea is that you want to know increases over a period of time. For example, do you want to know the total amount of article views for the last 7 days, for this month so far, for the last 30 days, etc.

This answer and this article do an excellent job of explaining all this, but here are some examples. For demonstration purposes I use a counter called walks_started_total.

The problem

Query: `walks_started_total`

enter image description here

Solution 1

Seeing the total for the last week: `increase(walks_started_total[1w])`

enter image description here

Solution 2

Over a 1 minute period: `increase(walks_started_total[1m])`

enter image description here

Upvotes: 15

brian-brazil
brian-brazil

Reputation: 34112

Counters are allowed to be reset to 0, so there's no need to do anything special here to handle it. See http://www.robustperception.io/how-does-a-prometheus-counter-work/ for more detail.

It's recommended to use a client library which will handle all of this for you.

Also, by convention you should suffix counters with _total so that metric should be news_reads_total.

Upvotes: 19

Related Questions