Micro benchmarking C++ Linux

Question

I am benchmarking a function of my c++ program using inline rdtsc(1st and last instruction in the function) My setup has isolated cores and hyper threading is off and the frequency is 3.5Mhz.

I cannot afford more than 1000 cpu cycles so i count the percentage of calls taking more than 1000 cpu cycles and that is approximately 2-3%. The structure being accessed in the code is huge and can certainly result in cache miss. But a cache miss is 300-400 cpu cycles.

Is there a problem with rdtsc benchmarking? If not, what else can cause a 2-3% of my cases going through the same set of instructions abruptly high number of cycles

I want help to understand what i should look for to understand this 2-3% of my WC(worst cases)

Micro benchmarking C++ Linux

Answers (1)

Related Questions