Reputation:
This simple C code first creates an array of 0xFFFFFF elements and then passes it two times, measuring how much time each pass takes:
#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define TESTSIZ 0xffffff
char testcases[TESTSIZ];
void gentestcases(void)
{
size_t i = 0;
while(i < TESTSIZ)
testcases[i++] = rand()%128;
return;
}
long long time_elapsed(struct timespec beg, struct timespec end)
{
if(end.tv_nsec < beg.tv_nsec) {
end.tv_nsec += 1000000000;
end.tv_sec--;
}
return 1000000000ll*(end.tv_sec-beg.tv_sec) + end.tv_nsec-beg.tv_nsec;
}
long long test( int(*func)(int) )
{
struct timespec beg, end;
clock_gettime(CLOCK_MONOTONIC, &beg);
int volatile sink;
size_t i = 0;
while(i < TESTSIZ)
sink = islower(testcases[i++]);
clock_gettime(CLOCK_MONOTONIC, &end);
return time_elapsed(beg, end);
}
int main()
{
gentestcases();
struct timespec beg, end;
printf("1st pass took %lld nsecs\n", test(islower));
printf("2nd pass took %lld nsecs\n", test(islower));
}
I compile it with gcc -O2 -std=gnu89 -o sb sillybench.c
Usually I get the result that processing the array for the second time is slower. The effect is small but noticeable (1-3 ms) and - with a single exception - repetitive:
m@m-X555LJ ~/UVA/fastIO $ ./sb
1st pass took 13098789 nsecs
2nd pass took 13114677 nsecs
m@m-X555LJ ~/UVA/fastIO $ ./sb
1st pass took 13052105 nsecs
2nd pass took 13134187 nsecs
m@m-X555LJ ~/UVA/fastIO $ ./sb
1st pass took 13118069 nsecs
2nd pass took 13074199 nsecs
m@m-X555LJ ~/UVA/fastIO $ ./sb
1st pass took 13038579 nsecs
2nd pass took 13079995 nsecs
m@m-X555LJ ~/UVA/fastIO $ ./sb
1st pass took 13070334 nsecs
2nd pass took 13324378 nsecs
m@m-X555LJ ~/UVA/fastIO $ ./sb
1st pass took 13031000 nsecs
2nd pass took 13167349 nsecs
m@m-X555LJ ~/UVA/fastIO $ ./sb
1st pass took 13019961 nsecs
2nd pass took 13310211 nsecs
m@m-X555LJ ~/UVA/fastIO $ ./sb
1st pass took 13041332 nsecs
2nd pass took 13311737 nsecs
m@m-X555LJ ~/UVA/fastIO $ ./sb
1st pass took 13030913 nsecs
2nd pass took 13177423 nsecs
m@m-X555LJ ~/UVA/fastIO $ ./sb
1st pass took 13060570 nsecs
2nd pass took 13387024 nsecs
Why is it the case? If anything, I'd suppose that processing an array for the first time should be slower, not for the second time!
If this matters:
m@m-X555LJ ~/UVA/fastIO $ gcc --version
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
System: Host: m-X555LJ Kernel: 4.4.0-21-generic x86_64 (64 bit gcc: 5.3.1)
Desktop: Cinnamon 3.0.7 (Gtk 2.24.30) Distro: Linux Mint 18 Sarah
CPU: Dual core Intel Core i5-5200U (-HT-MCP-) cache: 3072 KB
flags: (lm nx sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx) bmips: 8786
clock speeds: max: 2700 MHz 1: 2200 MHz 2: 2202 MHz 3: 2200 MHz
4: 2200 MHz
Upvotes: 12
Views: 190
Reputation: 22670
This effect is most likely caused by turbo mode (or Intel Turbo Boost Technology). Turbo mode allows a processor core to run at higher than nominal clock frequency. One factor for that is temporal bursts*. Commonly for some fraction of a second, the processor will achieve the highest frequency. It is highly likely that the first loop is running at higher clock frequency than the second.
You can confirm that, by manually setting the nominal frequency (2.20 GHz for your processor), e.g. by using cpufrequtils
or cpupower
. However, on many systems use intel_pstate
, which doesn't allow the userspace governor. Here's how you can disable turbo mode for intel_pstate
- or disable intel_pstate
all together.
Without turbo mode performance should be uniform.
*: Temperature is another factor, but i do doubt that it plays a role for 10 ms benchmark time. To illustrate, assume the CPU exceeds it's 15 W TDP and uses 20 W: Even a tiny 1 g of copper would only heat up by 0.5 K after 10 ms. I commonly see a short distinct burst (time, tens of milliseconds to few seconds) followed by a slow and steady decline (temperature, tens of seconds to minutes)
Note: gentestcases
runs for a significant amount of time (e.g. 240 ms) before the first actual test contributing to the "sprint" of the processor.
Upvotes: 5