AndyV
AndyV

Reputation: 398

How to do percentiles on custom metrics in Azure AppInsights?

I've used Prometheus to store performance metrics and query the results as percentiles (ex. 95th percentile response timing). I used prometheus-net to emit them.

What is the equivalent in Azure AppInsights?

I see there are percentile functions in AppInsights/Kusto but when I use GetMetric("blah").TrackValue(42) it stores Count, Min, Max, Sum, and StdDev, which isn't the histogram bucketing approach I'm used to in Prometheus.

for(int i=0; i < 500; i++) {
  //Write some metrics
  telemetryClient.GetMetric("blah").TrackValue(42); //real data isn't constant
}
customMetrics
| where name == "blah" 
//| summarize avg(value), percentiles(value, 50, 95)  by bin(timestamp, 2m)

Here is some data I logged with randomized values. The value column is the sum, which is not correct, so I don't see how I can properly do percentiles on this data. enter image description here

Upvotes: 6

Views: 4652

Answers (2)

hIpPy
hIpPy

Reputation: 5125

AFAIK, this is a common Statistics problem. One can get the percentile values given mean, standard deviation, only if it's a normal distribution.

Also, calculating percentile values is bit expensive compared to sum, count, min, max, std dev values, which can be done in a running fashion. So, I'm guessing that's why application insights does this.

Here is the formula,

Percentile Value = μ + zσ

where

μ: Mean
z: z-score from z table that corresponds to percentile value
σ: Standard deviation

Ref: https://www.statology.org/calculate-percentile-from-mean-standard-deviation/

The z-score value for P95 is 1.645, and for P99 is 2.326.

Ref: https://www.mymathtables.com/statistic/z-score-percentile-normal-distribution.html

So, here is the kusto query. Note that I do a percentile() aggregation in summarize, but you could choose min(), max(), or avg() depending on your needs (for >1m bin intervals).

customMetrics
| where name == "<METRIC_NAME>"
| extend mean = value / valueCount
| extend p95_zscore = 1.645
| extend p95calc = mean + (p95_zscore * valueStdDev)
| extend p99_zscore = 2.326
| extend p99calc = mean + (p99_zscore * valueStdDev)
| summarize
    avg = sum(value) / sum(valueCount),
    p95 = percentile(p95calc, 95),
    p99 = percentile(p99calc, 99)
    by ts = bin(timestamp, 1m)
| render timechart

Update 1: To figure out if the metric is a normal distribution, take few sample 1m intervals with all data points, and plot them. In my case, it was not normal distribution, so the metric is useless for percentiles. I hope AppInsights would have pre-aggregated P95, P99 values too. I guess I'll have to handroll my own impl.

PS: I'm not a stats person.

Upvotes: 0

Dmitry Matveev
Dmitry Matveev

Reputation: 2679

Each individual value is not stored when GetMetric().TrackValue() API is used with the default aggregations, one value is produced after 1 minute and that value is sent to AI with sum/count/min/max/... distribution. Therefore, it's not possible to plot percentiles of the original data points in Analytics later on.

There are only few aggregations currently available for GetMetric().TrackValue() API and histogram / tdigest is not one of them. You can submit a feature request (or a contribution) on AI SDK GitHub repository.

The workaround at the time being would be to use older API that submits point-in-time metric by default without the aggregation: TrackMetric() or a series of measurements in TrackEvent(). This will increase the amount of telemetry items sent (each metric will be sent separately without 1 minute aggregation of the values), but this will provide you with each value to perform percentiles aggregation in Analytics if necessary.

Upvotes: 1

Related Questions