m4r73n
m4r73n

Reputation: 806

Using a Grafana Histogram with Prometheus Buckets

I have a Prometheus metric called latency, with a bunch of buckets. I'm using an increase query to determine all the events that happened in the last 15 minutes (in all the buckets). query

This query works well, switching to table view shows numbers that make sense: most latencies are below 300ms, with some above that value:

table

However, when I use a Grafana Histogram, it seems like x and y axis are interchanged: Grafana histogram

Now I could use a different diagram style, like a Bar Gauge. I tried that, but it doesn't work well: I have too many buckets, so the labels become totally illegible. Also it forces me to display all the buckets that my application collects, but it would be nice if that wasn't set in stone, and I could aggregate buckets in Grafana. It also doesn't work well once I change the bucket sizes to exponential sizes.

Any hint how to either get the Histogram working properly (with x axis: bucket (in s), y axis: count), or another visualization that would be appropriate here? My preferred outcome would be something like the plot of a function.

Upvotes: 4

Views: 22693

Answers (1)

Sascha Doerdelmann
Sascha Doerdelmann

Reputation: 836

Answer

  1. The Grafana panel type "Bar gauge" with format option "Heatmap" and interval $__range seems to be the best option if you have a small number of buckets. There is no proper solution for large number of buckets, yet.
  2. The documentation states that the format option "Heatmap" should work with panel tape "Heatmap" (and it does), see Introduction to histograms and heatmaps with Pre-bucketed data. The Heatmap panel has an option to produce a histogram on mouseover, so you might want to use this.

About panel type Histogram

The Grafana panel type "Histogram" produces a value distribution and the value of some bucket is a count. This panel type does not work well with Prometheus histograms, even if you switch from format option "Time series" to "Heatmap". I don't know if this is due to the beta status of this panel type in the Grafana Version I am currently using (which is 9.2.4). There are also open bugs, claiming that the maximum value of the x axis is not computed correctly, see issue 32006 and issue 33073.

The larger the number of buckets, the better the estimation of histogram_quantile(). You could let the Histogram panel calculate a distribution of latencies by using this function. Let's start with the following query:

histogram_quantile(1, sum by (le) (rate(latency_bucket{...}[$__rate_interval])))

You could now visualize the query results with the Histogram panel and set the bucket size to a very small number such as 0.1. The resulting histogram ignores a significant amount of samples as it is only related to the maximum value of all data points within $__rate_interval.

The values on the y-axis depend on the interval. The smaller the intervall, the higher the values, simply due to more data points in the query result. This is a big downside, you loose the exact number of data points which you originally had in the buckets.

I can not really recommend this, but it might be woth a try.

Additional notes

Grafana has a transform functions like "Create heatmap" and "Histogram", but these are not useful for Prometheus histogram data. Note that "Create heatmap" allows to set one dimension to logarithmic.

There are two interesting design documents that show, that the developers of Prometheus are aware of problems with the current implementation of histograms and work on some promising features:

  • Sparse high-resolution histograms for Prometheus
  • Prometheus Sparse Histograms and PromQL

See DESIGN DOCUMENTS.

There also is this feature request Prometheus histogram as stacked chart over time #11464.

There is an excellent overview about histograms: How to visualize Prometheus histograms in Grafana.

For setting colors on each bar use overwrites as described in How to get different colour for each legend in a bar chart.

Upvotes: 7

Related Questions