Reputation: 79
Say I have a metrics request_failures
for users. For each user I add a unique label value to the metrics. So for user u1, when a request failed twice, I get the following metrics:
request_failures{user_name="u1"} 2
I also have a rule that fires when there are new failures. Its expression is:
increase(request_failures[1m]) > 0
This works well for a user that already encountered failures. For example, when u1 encounters the third failure, the rule fires.
When a request failed for a new user u2, I get the metrics as:
request_failures{user_name="u1"} 2
request_failures{user_name="u2"} 1
Now the problem is that the alert rule doesn't fire for u2. It seems that the rule cannot recognize a "new metrics", although all the three metrics are identically request_failures, just with different labels.
Anyone can point out how I should construct the rule?
Upvotes: 8
Views: 5635
Reputation: 20176
As already put by @MichaelDoubez , increase()
does not consider newly created metric as a value increase. Unfortunately, same goes for changes()
. There are reasons for that, such as a missing scrape for example, but it still can be solved with a query.
increase(request_failures[10m]) > 0
or
( request_failures unless request_failures offset 10m )
The second part (beginning with or
) will fire for 10 minutes (defined by the offset
) when there is a new metric.
Upvotes: 5
Reputation: 6863
The reason the rule doesn't fire is that the increase()
function doesn't consider a counter newly created to be 0 before the first scrape. I didn't find any source on that but it seems to be the case.
Therefore you want to detect two cases:
This can be rephrased in the opposite logic:
a alert should be triggered for a user with errors unless there was no increase in errors in the last N minutes for this user
Which readily translates into the following promql:
rule: request_failures > 0 UNLESS increase(request_failures[1m]) == 0
On hindsight, regarding the increase()
function, it cannot assume the previous value is 0 because it is expressed inside a range. The previous value may be out of range and not equal to 0. So it makes sense to have at least two points to have a value.
Upvotes: 3
Reputation: 79
This should be the answer: https://www.robustperception.io/dont-put-the-value-in-alert-labels.
The key is that the label should not include variable values as it is a part of the identity of a metric. The solution is to add username as annotation instead of label of a metric.
Upvotes: -1