Reputation: 445
Because Prometheus topk returns more results than expected, and because https://github.com/prometheus/prometheus/issues/586 requires client-side processing that has not yet been made available via https://github.com/grafana/grafana/issues/7664, I'm trying to pursue a different near-term work-around to my similar problem.
In my particular case most of the metric values that I want to graph will be zero most of the time. Only when they are above zero are they interesting.
I can find ways to write prometheus queries to filter data points based on the value of a label, but I haven't yet been able to find a way to tell prometheus to return time series data points only if the value of the metric meets a certain condition. In my case, I want to filter for a value greater than zero.
Can I add a condition to a prometheus query that filters data points based on the metric value? If so, where can I find an example of the syntax to do that?
Upvotes: 27
Views: 99668
Reputation: 8544
If you're confused by brian's answer: The result of filtering with a comparison operator is not a boolean, but the filtered series. E.g.
min(flink_rocksdb_actual_delayed_write_rate > 0)
Will show the minimum value above 0.
In case you actually want a boolean (or rather 0 or 1), use something like
sum (flink_rocksdb_actual_delayed_write_rate >bool 0)
which will give you the greater-than-zero count.
Upvotes: 34
Reputation: 18084
This can be solved with subqueries:
count_over_time((metric > 0)[5m:10s])
The query above would return the number of metric
data points greater than 0 over the last 5 minutes.
This query may return inaccurate results depending on the relation between the second arg in square brackets (aka step
for the inner query) and the real interval between raw samples (aka scrape_interval
):
step
exceeds scrape_interval
, them some samples may be missing during the calculations. In this case the query will return lower than expected result.step
is smaller than the scrape_interval
, then some samples may be counted multiple times. In this case the query will return bigger than expected result.So it is recommended setting the step
to scrape_interval
in order to get accurate results.
P.S. The issues mentioned above are solved in VictoriaMetrics - Prometheus-like monitoring system I work on. It provides count_gt_over_time() function, which ideally fits this case. For example, the following MetricsQL query returns the exact number of raw samples with values greater than 0 over the last 5 minutes:
count_gt_over_time(metric[5m], 0)
Upvotes: 6
Reputation: 34172
Filtering is done with the comparison operators, for example x > 0
.
Upvotes: 25