Reputation: 131
I am trying to time different algorithms in C to determine which ones are faster. For example, consider the following (naive) function to find the sum of the even fibonacci numbers less than a certain other number:
static long a(long under) {
long x = 0, y = 1;
long sum = 0;
do {
if (!(x % 2))
sum += x;
const long tmp = x;
x += y;
y = tmp;
} while (x < under);
return sum;
}
This function seems to me like it should run in logarithmic time with respect to the input (the number of loop iterations should be something like log_phi(under)).
However, when I time this function, it takes less time the higher the input is. Here is a full reproducible example:
#include <stdio.h>
#include <time.h>
static long a(long under) {
long x = 0, y = 1;
long sum = 0;
do {
if (!(x % 2))
sum += x;
const long tmp = x;
x += y;
y = tmp;
} while (x < under);
return sum;
}
static long time_a(long value) {
struct timespec start, end;
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &start);
a(value);
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &end);
return end.tv_nsec - start.tv_nsec + (end.tv_sec - start.tv_sec) * 1000000000;
}
void main() {
const long t1 = time_a(1000);
const long t2 = time_a(1000000);
const long t3 = time_a(1000000000);
printf("time for 1000: %ld ns\n", t1);
printf("time for 1000000: %ld ns\n", t2);
printf("time for 1000000000: %ld ns\n", t3);
}
If I compile this with GCC and then run it, I get the following output:
time for 1000: 4483 ns
time for 1000000: 2564 ns
time for 1000000000: 2514 ns
What is going on here?
Edit: I am using GCC 11.2.1 with glibc 2.34, on Gentoo Linux with kernel 5.15.11. The code was compiled with gcc test.c -o test
. Looking at the binary in objdump, GCC is not optimizing out the call to a
when compiled this way. The code is being run on an HP ZBook 15u G4 with an Intel i5-7200U.
Upvotes: 2
Views: 202
Reputation: 213385
The condition on line 8: if (!(x % 2))
is evaluated 17 times, 48 times and 93 times for the inputs 1000
, 1000000
and 1000000000
.
These are simply too few iterations -- your measurement overhead is larger than the time you are trying to measure.
To measure this reliably, invoke a()
1,000,000 times and measure the total time taken.
When I do that, and divide the time by 1e6
, I get:
time for 1000: 49.81 ns
time for 1000000: 59.16 ns
time for 1000000000: 85.02 ns
which at least goes in the right direction.
P.S. Measuring performance of unoptimized code is a fool's errand: the compiler may introduce all kinds of artifacts.
Upvotes: 1