Statistically finding performance differences

Question

I am generating load on a server, and collecting performance metrics, every 10 seconds I save some data (IO util, CPU util, etc).

I make a change to the code and run another load test and collect metrics.

I have a ton of metrics so I'm looking for two things:

Which metrics have changed the most
Of those metrics that changed, how much did they change?

For the first task, I'm currently running a Pearson correlation and between the two runs and sorting by LOWEST correlation for each metric.

For the second task, I'm passing the metrics with the lowest correlation to a function where I just compare each runs average performance and subtract i.e. sum(list_of_samples) / len(list_of_samples) - sum(list_of_samples2) / len(list_of_samples2)

Unfortunately, I'm not getting good data which I suspect is due to:

The data is somewhat noisy
The load test will vary load to not overwhelm the server (increasing at different rates)

Does anybody know how I can approach this better, or some improvements I can make? I'm currently writing in Python but I can switch languages if there is some magic library that does this.

Statistically finding performance differences

Answers (1)

Related Questions