Radhika Padia
Radhika Padia

Reputation: 31

Micro benchmarking C++ Linux

I am benchmarking a function of my c++ program using inline rdtsc(1st and last instruction in the function) My setup has isolated cores and hyper threading is off and the frequency is 3.5Mhz.

I cannot afford more than 1000 cpu cycles so i count the percentage of calls taking more than 1000 cpu cycles and that is approximately 2-3%. The structure being accessed in the code is huge and can certainly result in cache miss. But a cache miss is 300-400 cpu cycles.

Is there a problem with rdtsc benchmarking? If not, what else can cause a 2-3% of my cases going through the same set of instructions abruptly high number of cycles

I want help to understand what i should look for to understand this 2-3% of my WC(worst cases)

Upvotes: 3

Views: 360

Answers (1)

Dan Bonachea
Dan Bonachea

Reputation: 2487

Often rare "performance noise" like you describe is caused by context switches in the timed region, where your process happened to exceed its scheduler quanta during your interval and some other process was scheduled to run on the core. Another possibility is a core migration by the kernel.

When you say "My setup has isolated cores", are you actually pinning your process to specific cores using a tool (e.g. hwloc)? This can greatly help to get reproducible results. Have you checked for other daemon processes that might also be eligible to run on your core?

Have you tried measuring your code using a sampling profiler like gprof or HPCToolkit? These tools provide alot more context and behavioral information that can be difficult to discover from manual instrumentation.

Upvotes: 3

Related Questions