Reputation: 445

Filter prometheus results by metric value, not by label value

Because Prometheus topk returns more results than expected, and because https://github.com/prometheus/prometheus/issues/586 requires client-side processing that has not yet been made available via https://github.com/grafana/grafana/issues/7664, I'm trying to pursue a different near-term work-around to my similar problem.

In my particular case most of the metric values that I want to graph will be zero most of the time. Only when they are above zero are they interesting.

I can find ways to write prometheus queries to filter data points based on the value of a label, but I haven't yet been able to find a way to tell prometheus to return time series data points only if the value of the metric meets a certain condition. In my case, I want to filter for a value greater than zero.

Can I add a condition to a prometheus query that filters data points based on the metric value? If so, where can I find an example of the syntax to do that?

Upvotes: 27

Answers (3)

Caesar

Reputation: 8544

If you're confused by brian 's answer: The result of filtering with a comparison operator is not a boolean, but the filtered series. E.g.

min(flink_rocksdb_actual_delayed_write_rate > 0)

Will show the minimum value above 0.

In case you actually want a boolean (or rather 0 or 1), use something like

sum (flink_rocksdb_actual_delayed_write_rate >bool 0)

which will give you the greater-than-zero count.

Upvotes: 34

valyala

Reputation: 18084

This can be solved with subqueries:

count_over_time((metric > 0)[5m:10s])

The query above would return the number of metric data points greater than 0 over the last 5 minutes.

This query may return inaccurate results depending on the relation between the second arg in square brackets (aka step for the inner query) and the real interval between raw samples (aka scrape_interval):

If the step exceeds scrape_interval, them some samples may be missing during the calculations. In this case the query will return lower than expected result.
If the step is smaller than the scrape_interval, then some samples may be counted multiple times. In this case the query will return bigger than expected result.

So it is recommended setting the step to scrape_interval in order to get accurate results.

P.S. The issues mentioned above are solved in VictoriaMetrics - Prometheus-like monitoring system I work on. It provides count_gt_over_time() function, which ideally fits this case. For example, the following MetricsQL query returns the exact number of raw samples with values greater than 0 over the last 5 minutes:

count_gt_over_time(metric[5m], 0)

Upvotes: 6

brian-brazil

Reputation: 34172

Filtering is done with the comparison operators, for example x > 0.

Upvotes: 25

Filter prometheus results by metric value, not by label value

Answers (3)

Related Questions