Luiz E.
Luiz E.

Reputation: 7279

how to group `http_server_requests_total` metrics on Prometheus

I have just instrumented my app and I'd like to show how many hits a certain endpoint has had.

I'm currently using the Ruby client so I get this out of the box already with a certain tags: host, method, region, instance, app, and app is always the same

I don't really care about separate it by region or method, I just want to know how many hits it had, so I did a query like this:

http_server_requests_total{app="sumiu-web", path="/metrics"}

I see that Prometheus still groups them in different "tags":

enter image description here

these numbers look right, since a new deployment will spin up a new instance with different tags.

now, I thought I could just sum it up but I get a completely different number from what should the total be:

sum by(app) (http_server_requests_total{app="sumiu-web", path="/metrics"})

enter image description here

I can't figure out what I'm doing wrong, the docs have a similar function, so I thought this should be ok but apparently it is not...

What is the correct way sum these numbers together?

Upvotes: 1

Views: 2506

Answers (2)

DazWilkin
DazWilkin

Reputation: 40386

You write that "Prometheus still groups them" but the screenshots are from Grafana (not Prometheus) and it's possible behavior between the two may differ.

When you filter a metric by specific labels and values (i.e. app="sumiu-web"), you restrict the set of measurements to the subset where that label has that value.

But (!) you do not restrict others labels, so you could, for example, have two measurements where e.g. one measurement is for app="sumiu-web" and region="foo" and another measurement is for app="sumiu-web" and region="bar". These are different measurements.

sum by(app) does nothing because you're already limiting measurements to only those where app="sumiu-web".

What you probably want to do is sum values where app="sumiu-web" and you don't care (for example) what the value of region is. To do this, you can use without(region):

sum without(region) (http_server_requests_total{app="sumiu-web", path="/metrics"})

Note: You will likely want to include in the without all the other labels that apply to the metric http_server_requests_total whose different values you don't care about.

Upvotes: 1

valyala
valyala

Reputation: 18084

The sum() is an aggregate function, so it returns sums of the selected time series individually per each point on the graph.

The sum(http_server_requests_total) doesn't return the total sum of all the time series with the name http_server_requests_total because these time series do not have samples at every data point displayed on the graph - some of them stop receiving new samples, while others just appear and start receiving samples.

So, how to fill gaps in time series with missing datapoints, so they can be sum()-ed in an expected way? The only approach is to use increase function with the lookbehind window in square brackets covering all the matching time series from the beginning. For example, sum(increase(http_server_requests_total[1y])) would return a time series showing the total number of requests over the last year ending at the given point on the graph. In other words, every point on the graph would show the total number of requests over the year-long lookbehind window ending at that point. This approach doesn't scale well, since Prometheus needs to fetch all the raw samples for the matching time series on the given lookbehind window, and then calculate the increase() per each point on the graph.

There is more scalable approach exists for alternative Prometheus-like monitoring solution I work on - VictoriaMetrics. This system provides two features, which help with sum'ing of multiple metrics of counter type:

  • running_sum is for calculating running total over the selected time range.
  • ability to skip lookbehind window in square brackets when calling rollup functions such as increase. In this case the lookbehind window is automatically set to the interval between points on the graph (aka step).

Thanks to these features, the following query returns a time series showing the running total for all the requests over the selected time range on the graph:

sum(running_sum(increase(http_server_requests_total)))

Upvotes: 1

Related Questions