How to get an accurate performance measure?

Question

In our project we're trying to automatically monitor the performance of test runs, to make sure that we don't have any significant changes in the performance of the program over time.

The problem is that there seems to be a consistent 5% variability in the measures we get. That is, on the same machine with the same program (no recompilation) running the same test we get values that differ by around 5% from run to run. This is way too much for what we want to use the numbers for.

We're already excluding setup costs from the timing considerations - that is, from within C++ code itself we're grabbing the time immediately before and after running the time-critical portions, rather than doing the timing of the whole program on the OS level. We are also doing averaging and outlier exclusion. The problem is that the variability looks to also have long-term trends, so we get tight clustering of times for replicates right after each other, but an hour or two later the times are substantially different. (Unfortunately, spreading the test out over several hours is not feasible.) The tests are also being run on a dedicated machine while "nothing else" is being run on it.

We're not quite sure where the timing variation is coming from, but it may have to do with the processor and the system - there's indications that the size of the variability depends on what machine the program is running on.

Does anyone have an idea where this variation is likely to be coming from, and how to remove it? The tests are running on a dedicated machine, so changing the operating system settings would be possible.

(As indicated by the tags, this is a C++ program running on a x86 Linux system, if that helps clarify things.)

Edit: Response to comments

Our current timing scheme is to use the clock() function from the C standard library, looking at the difference in the return value from before/after the functions we want to test.

The code we're testing should be deterministic, and shouldn't involve heavy IO.

I realize that the situation is a little hazy for a "silver bullet" answer. I guess I'm more looking for a "these are the factors that are important to consider, this is the order you probably should check them in, and here's how you go about checking each of them" type answer.

How to get an accurate performance measure?

Answers (1)

Related Questions