Reputation: 398
I've used Prometheus to store performance metrics and query the results as percentiles (ex. 95th percentile response timing). I used prometheus-net to emit them.
What is the equivalent in Azure AppInsights?
I see there are percentile functions in AppInsights/Kusto but when I use GetMetric("blah").TrackValue(42) it stores Count, Min, Max, Sum, and StdDev, which isn't the histogram bucketing approach I'm used to in Prometheus.
for(int i=0; i < 500; i++) {
//Write some metrics
telemetryClient.GetMetric("blah").TrackValue(42); //real data isn't constant
}
customMetrics
| where name == "blah"
//| summarize avg(value), percentiles(value, 50, 95) by bin(timestamp, 2m)
Here is some data I logged with randomized values. The value column is the sum, which is not correct, so I don't see how I can properly do percentiles on this data.
Upvotes: 6
Views: 4652
Reputation: 5125
AFAIK, this is a common Statistics problem. One can get the percentile values given mean, standard deviation, only if it's a normal distribution.
Also, calculating percentile values is bit expensive compared to sum, count, min, max, std dev values, which can be done in a running fashion. So, I'm guessing that's why application insights does this.
Here is the formula,
Percentile Value = μ + zσ
where
μ: Mean
z: z-score from z table that corresponds to percentile value
σ: Standard deviation
Ref: https://www.statology.org/calculate-percentile-from-mean-standard-deviation/
The z-score value for P95 is 1.645
, and for P99 is 2.326
.
Ref: https://www.mymathtables.com/statistic/z-score-percentile-normal-distribution.html
So, here is the kusto query. Note that I do a percentile()
aggregation in summarize
, but you could choose min()
, max()
, or avg()
depending on your needs (for >1m bin intervals).
customMetrics
| where name == "<METRIC_NAME>"
| extend mean = value / valueCount
| extend p95_zscore = 1.645
| extend p95calc = mean + (p95_zscore * valueStdDev)
| extend p99_zscore = 2.326
| extend p99calc = mean + (p99_zscore * valueStdDev)
| summarize
avg = sum(value) / sum(valueCount),
p95 = percentile(p95calc, 95),
p99 = percentile(p99calc, 99)
by ts = bin(timestamp, 1m)
| render timechart
Update 1: To figure out if the metric is a normal distribution, take few sample 1m intervals with all data points, and plot them. In my case, it was not normal distribution, so the metric is useless for percentiles. I hope AppInsights would have pre-aggregated P95, P99 values too. I guess I'll have to handroll my own impl.
PS: I'm not a stats person.
Upvotes: 0
Reputation: 2679
Each individual value is not stored when GetMetric().TrackValue()
API is used with the default aggregations, one value is produced after 1 minute and that value is sent to AI with sum/count/min/max/... distribution. Therefore, it's not possible to plot percentiles of the original data points in Analytics later on.
There are only few aggregations currently available for GetMetric().TrackValue()
API and histogram / tdigest is not one of them. You can submit a feature request (or a contribution) on AI SDK GitHub repository.
The workaround at the time being would be to use older API that submits point-in-time metric by default without the aggregation: TrackMetric()
or a series of measurements in TrackEvent()
. This will increase the amount of telemetry items sent (each metric will be sent separately without 1 minute aggregation of the values), but this will provide you with each value to perform percentiles aggregation in Analytics if necessary.
Upvotes: 1