Pierre
Pierre

Reputation: 13046

How to calculate the percentage by label for gauge metrics?

I'm exporting some metrics about running tasks, the available data includes the current number of tasks by their status and queue:

# TYPE gauge
tasks{queue="high", status="queued"} 2.0
tasks{queue="high", status="started"} 1.0
tasks{queue="high", status="successful"} 5.0
tasks{queue="high", status="failed"} 1.0

tasks{queue="low", status="queued"} 1.0
tasks{queue="low", status="started"} 2.0
tasks{queue="low", status="successful"} 3.0
tasks{queue="low", status="failed"} 2.0

These numbers change regularly when the tasks are added or expired from the database, so for example the failed tasks number will go up and down depending on the tasks in the database at the time of collecting the data.

I don't have a way to get the total tasks count, so that's all the data that I have, I want to calculate the percentage of the tasks by their status label and create a graph of this value using Grafana.

How the percentage should be calculated?

What I've tried so far:

Get the percentage of all successful tasks:

( sum(tasks{status="successful"}) / sum(tasks) ) * 100

Get the percentage of successful tasks by queue:

( sum(tasks{status="finished"}) by (queue) / sum(tasks) by (queue) ) * 100

How can I get this percentage by time? For example when setting the time range in Grafana? I can use the variable $__range but how should I do the calculation?

I have other data where I have count metrics and I'm doing the following:

sum(increase(tasks_total{status="success"}[$__range])) /
sum(increase(tasks_total{status="started"}[$__range]))

But these are counts and these calculation don't apply to gauge metrics.

Upvotes: 4

Views: 10449

Answers (2)

valyala
valyala

Reputation: 17800

The per-status percentage for the currently running tasks can be calculated with the following PromQL query:

100 * (sum(tasks) by (status) / on() group_left() sum(tasks))

It uses on() and group_left() modifiers for / operation in order to augment the default behavior for matching time series on the left and the right side of / operation. See these docs for details.

If you need to get per-queue per-status percentage for the currently running tasks, then the following query should work:

100 * (
  sum(tasks) by (queue, status)
    / on(queue) group_left()
  sum(tasks) by (queue)
)

If you need per-status percentage for tasks, which were run during the selected time $__range, then the following query should work:

100 * (
  sum(sum_over_time(tasks[$__range])) by (status)
    / on() group_left()
  sum(sum_over_time(tasks[$__range]))
)

It uses sum_over_time function for aggregating the number of tasks over the given $__range. The $__range can be substituted with any supported time duration - see these docs.

Upvotes: 6

Weston A. Greene
Weston A. Greene

Reputation: 127

I think you need to use group_left. See an example: https://www.robustperception.io/using-group_left-to-calculate-label-proportions

In your case:

sum without (queue) tasks{status="successful"}
/ ignoring(status) group_left
sum without (queue, status) tasks

Upvotes: 0

Related Questions