Quuxplusone
Quuxplusone

Reputation: 27065

rrdtool: Compute 95th percentile of data within a sliding window

I'm using rrdtool to graph data about CPU usage as produced and stored by Munin. Munin (at least for us) stores each data-series in an .rrd file with 12 RRAs: "MIN", "MAX", and "AVERAGE" over each of the four periods "last 2d in 5m intervals", "last 9d in 30m intervals", "last 270d in 12h intervals", and "last 177y in 144d intervals".

I already know how to use rrdtool graph to produce a trend line indicating where my average CPU usage is going. (For simplicity, we can pretend I'm on a single-CPU system; in real life I have more code to deal with that.)

rrdtool graph /tmp/foo.png \
  --start -12w --end +24w \
  --lower-limit 0 --upper-limit 100 --rigid \
  --title 'cpu usage' --width 620 --height 200 --border 0 \
  --vertical-label 'cpu usage' \
  DEF:idle=/var/lib/munin/mybox/mybox-cpu-idle-d.rrd:42:AVERAGE \
  DEF:iowait=/var/lib/munin/mybox/mybox-cpu-iowait-d.rrd:42:AVERAGE \
  CDEF:percent_used=100,idle,-,iowait,- \
  AREA:percent_used#00880077:'cpu usage' \
  VDEF:fit_m=percent_used,LSLSLOPE \
  VDEF:fit_b=percent_used,LSLINT \
  CDEF:trendline=percent_used,POP,fit_m,COUNT,*,fit_b,+ \
  LINE1:trendline#FFBB00:'Trend since 12w ago'

The problem with this graph is that it shows only the average CPU usage trend. But my workload is spiky: usage is very low 90% of the time and then has brief spikes. What I really care about is the trend of the spikes in CPU usage.

So I could run the same command replacing AVERAGE with MAX... but the actual maxes are so randomly distributed (and usually close to 100%) that they don't produce any useful trend line.

So I'm thinking that the graph I actually want would be a graph of the 95th percentile (or maybe just the 75th percentile... ideally I'd be able to adjust the parameter), where that "percentile" is taken over the data in each consecutive 24-hour period.

Conceptually, I want to boil down our last 9 days of data (48 data points per day) into just 9 data points (1 data point per day — representing the Nth percentile of the 48 original points from that day).

And then I'd fit a line to that data using LSLSLOPE and LSLINT and display it on the same graph as the rest of this stuff.

But I can't figure out how to boil down the data in this way, using rrdtool's RPN facilities.

I know that I can use PERCENTNAN to get the scalar number that is the 95th percentile of my whole data-series, but I want a data-series consisting of 9 numbers, not just one scalar.

I know that I can use TRENDNAN to get a data-series that is the mean of a sliding window of my data-series, which would be good enough if only it gave me the median (50th percentile) instead of the mean, and then allowed me to adjust that parameter from "50" up to "95"... but it doesn't.


Alternatively, I know how to use Python to compute the series I want, using rrdtool first and rrdtool fetch, but then there's no simple way to feed that series back into rrdtool to create the graph.


I'm thinking maybe I could extract usage_today, usage_yesterday, usage_2d, usage_3d,... into nine separate series, use PERCENTNAN on them all individually, and then somehow fit a line to that. But that's mostly desperate handwaving; if someone posted an answer that actually made that approach work, I'd accept it.

Upvotes: 3

Views: 2696

Answers (1)

Steve Shipway
Steve Shipway

Reputation: 4027

RRDTool has 95th percentile functionality built in. Note that the accuracy of the percentail calculations will depend on the granularity of the data available in the requested time period, though... so the bigger your 1-pdp RRA is, the better.

So, for example, to get a horizontal line at the 95th percentile, we can use these directives:

  DEF:idlehr=/var/lib/munin/mybox/mybox-cpu-idle-d.rrd:42:AVERAGE:step=1 
  VDEF:pctidle=idlehr,95,PERCENTNAN
  HRULE:pctidle#ff0000:95th_Percentile

The step=1 on the end of the DEF ensures that the highest resolution data available will be selected. This may be computationally intensive, if you're graphing for a full year and high resolution data are avaialable for this time window!

The problem is, though, that you want a graph showing a different value for each day -- in effect, a sliding window of percentile calculations, in the same way as TRED and PREDICT work, but with a step of one day. RRDTool cannot do this.

So, the answer is, you can show a graph for one day with a single value percentile for that day. You cannot create a graph with one data point per day, where that data point is calculated as the percentile for that day.

The only way I can think of to achieve this is to repeatedly call rrdtool xport iteratively to calculate the percentile values for a sequence of days, and then use that data to generate a bar graph in another graphing package.

Upvotes: 1

Related Questions