Steven Morad
Steven Morad

Reputation: 2617

Statistically finding performance differences

I am generating load on a server, and collecting performance metrics, every 10 seconds I save some data (IO util, CPU util, etc).

I make a change to the code and run another load test and collect metrics.

I have a ton of metrics so I'm looking for two things:

For the first task, I'm currently running a Pearson correlation and between the two runs and sorting by LOWEST correlation for each metric.

For the second task, I'm passing the metrics with the lowest correlation to a function where I just compare each runs average performance and subtract i.e. sum(list_of_samples) / len(list_of_samples) - sum(list_of_samples2) / len(list_of_samples2)

Unfortunately, I'm not getting good data which I suspect is due to:

Does anybody know how I can approach this better, or some improvements I can make? I'm currently writing in Python but I can switch languages if there is some magic library that does this.

Upvotes: 0

Views: 114

Answers (1)

Direvius
Direvius

Reputation: 427

That's how we do it.

The first test is to estimate maximum throughput. To do that, you just start one instance of load generator (one thread, for example), that sends requests to the server one-by-one without any pauses. Then start adding one more instance every minute. After a while, your throughput (processed requests per second) stops growing and may even fall down slightly because of concurrency issues. That's the maximum throughput for this test. You may use it to compare multiple tests when you changing your code. You also may find some interesting bottlenecks while your server is under maximum load.

The second test is to estimate response times. Select a level of load that is about 50-80% from maximum throughput you found in previous test. Generate constant load for about 10-15 minutes (depends on system, maybe in your case you'll have to warm up before the actual test). Collect response times and resources usage. Then you can compare different statistics from what you collected. For response times it may be a 99-th or 95-th percentile and average load for cpu and disk.

Upvotes: 1

Related Questions