Reputation: 33735
I am using prom-client in nodejs to publish a /metrics
endpoint. I want to monitor sales of varying amounts which occur sporadically over time.
What is the best way to track a sporadic or discontinuous metric in prometheus? None of the existing metric types seem to be a good fit.
Gauge
) is geared towards continuous data (such as CPU speed or concurrent requests).Histogram
metric can capture discontinuous data, but requires manual percentiles and apparently only estimates quantiles (https://prometheus.io/docs/practices/histograms/#errors-of-quantile-estimation). Also the counts are wiped out when the metrics server restarts.Summary
metric can capture discontinuous data, but is “in general not aggregatable” (https://latencytipoftheday.blogspot.com/2014/06/latencytipoftheday-you-cant-average.html).Here is a simple setup with a Gauge
, which obviously does not capture the
import express from 'express'
import promClient, { Gauge } from 'prom-client'
export const someMetric = new Gauge({
name: 'some_metric',
help: 'Track some metric; type = [a, b, c]',
labelNames: ['one', 'two'],
})
const metricServer = express()
metricServer.get('/metrics', async (req, res) => {
console.log('Metrics scraped')
res
.set('content-type', 'text/plain')
.send(await promClient.register.metrics())
})
// intermittent callback that reports sales
service.onSale(value => {
// this will simply overwrite the previous sale :(
someMetric.labels('a', 'all').set(value)
})
metricServer.listen(9991, () =>
console.log(`🚨 Prometheus listening on http://localhost:9991/metrics`)
)
My current plan is to create a new database to internally track a rolling 24-hr average of sales, and then expose that as a single continuous metric to prometheus. It seems awkward to keep a rolling average internally in addition to prometheus’s aggregation capabilities though.
Upvotes: 2
Views: 1704
Reputation: 454
Without knowing what exactly is the purpose behind capturing this data, it's hard to tell whether a Gauge, Summary or Histogram would best fit your needs but I'll do my best with my assumptions. But first, let's just begin with a simplification of what Prometheus does and that may help visualize where I'm headed.
Prometheus is a time series database. That means, that every time your data gets scraped, it keeps at that given timestamp a snapshot of your metrics with their recorded values so in a very simplified version you end up with something like <timestamp, your_metric{label="1"} value>
.
Assuming that what you want is to capture only the amount of money payed during a sale and you have finite number of customers, Gauges can help you see the paid amount at any given time differentiating any of the customers by label* (though, a counter would do just fine too).
Now, your question was about keeping track of the data. Plotting this shouldn't be an issue. Even though the data is not continuous, you'll see the data in any plotter, e.g. Grafana. Though, seeing dots (<timestamp, value of your metric for each label combination>
) or small lines will not tell any story making them almost meaningless and will be hard to keep track of. What you could do to make this data continuous is to aggregate over time. Aggregating over time, allows you to instead of getting aggregated values at each timestamp, to get the aggregated values throughout your selected time window.
Let's try to visualize this:
Prometheus scrapes the data every 2 seconds. In 30 minutes, your gauge records 4 sales only. Two at minute 1 by two different customers and two at minute 20 from two different customers. If you plot this as is, you'll see 4 dots. If you aggregate this, e.g. by average, you'll see 2 dots at minute 1 and minute 20 containing the average of both sales.
If you'd be interested to see a continuous story, e.g. to see in a given time period what is the average sum of sales, you'd need to aggregate over time. The crucial difference: at any plotted point, you'd see the aggregated value between that timestamp and the selected time window. So, if you'd use on our example above avg_over_time
instead of avg
and you select your time window 30 minutes, you'd have 0 until minute 1, from minute 1 until minute 20 you'd see the average of the two sales that happened at minute 1, from minute 20 to minute 31 (30 minutes after the two sales from minute 1), you'd see the average from all 4 sales. Then, from minute 31 to minute 50 you'll see the average of the last 2 sales and then from minute 50 again 0. If you select a larger time windows, like 24 hours, you'd get the same effect. Just bear in mind that the larger this number is, the more computationally intensive is for Prometheus DB. Having a lot of labels* each with a high variance of values will make having such time windows very slow. The query for this would look like:
*I emphasize the importance of the cardinality of a metric: the more labels you add to a metric, the more entries prometheus has to go over to do calculations since for each label combination it will create a time-series.
Upvotes: 1