Shibankar
Shibankar

Reputation: 846

Promql: Is it possible to get total count in Query_Range

For example, I have a prometheus query which return "1" on HTTP status 200 and "0" on HTTP status other than 200. Now, I am using the query_range api where I pass the time range (start and end) and the step.

API-Endpoint: http://my-prometheus.com/api/v1/query_range
Query: http_response_ok{appname="XXX"}
Start: 2020-06-17T00:00:00
end:2020-06-17T23:59:59
step: 300000ms     (=5min)

The above query return me the data of every 5mins for the entire day in a form of "0" and "1". Total 289 point approx.

Is it possible to get the total count of all "1" and "0" for that specific time period ? I have tried count_over_time which gives the total count. How to add a filter so that it returns the count when value == 0 or 1

count_over_time(http_response_ok{appname="XXX"}[24h])

FYI, Actual query is not http_request and I can' use http_request_total

Upvotes: 1

Views: 3917

Answers (2)

valyala
valyala

Reputation: 17784

Note that the /api/v1/query_range returns calculated results instead of raw samples stored in the database. It returns exactly 1 + (end - start) / step samples on the [start ... end] time range with step interval between them, where start, end and step are the corresponding query args passed to /api/v1/query_range. See these docs for details on how Prometheus calculates the returned results. If you need to obtain raw samples, then a range query must be sent to /api/v1/query. For example, /api/v1/query?query=http_response_ok[24h]&time=t would return raw samples on the time range (t-24h ... t]. See this article for details.

If the http_response_ok time series can have only 0 or 1 values, then the following queries can be used for returning the exact number of raw samples with 0 and 1 values:

  • The number of raw samples with 1 value over the last 24 hours:
avg_over_time(http_response_ok[24h]) * count_over_time(http_response_ok[24h])
  • The number of raw samples with 0 value over the last 24 hours:
(1 - avg_over_time(http_response_ok[24h])) * count_over_time(http_response_ok[24h])

How do these queries work? They use avg_over_time() function for calculating the average value for raw samples over the last 24 hours. Internally this value is calculated as sum(raw_samples) / count(raw_samples). Then the result is multiplied by count_over_time(), which returns the number of raw samples over the last 24 hours, e.g. it equals to count(raw_samples).

So the first query is equivalent to sum(raw_samples) / count(raw_samples) * count(raw_samples) = sum(raw_samples). Since raw_samples may have only 0 and 1 values, then sum(raw_samples) = count(raw_samples_equal_to_1).

The second query equals to (1 - sum(raw_samples)/count(raw_samples)) * count(raw_samples) = count(raw_samples) - sum(raw_samples) = count(raw_samples) - count(raw_samples_equal_to_1) = count(raw_samples_equal_to_0).

If the http_response_ok time series can contain other values than 0 and 1, then queries listed above won't work. In this case count_gt_over_time, count_le_over_time, count_eq_over_time and count_ne_over_time functions from MetricsQL may help.

Upvotes: 0

Shibankar
Shibankar

Reputation: 846

After doing some research I was able to find the answer. Basically inside the {} we are doing checks b/w label. Outside the {} we can put the condition for values.

So, to find the total counts where value is ==1 in past 24hrs, the query should be like this:

count_over_time(http_response_ok{appname="XXX"==1}[24h:])

And to find the total counts where value is ==0 in past 24hrs, the query should be like this:

count_over_time(http_response_ok{appname="XXX"==0}[24h:])

Upvotes: 2

Related Questions