Benji Dial
Benji Dial

Reputation: 131

clock_gettime returning weird results

I am trying to time different algorithms in C to determine which ones are faster. For example, consider the following (naive) function to find the sum of the even fibonacci numbers less than a certain other number:

static long a(long under) {
  long x = 0, y = 1;
  long sum = 0;
  do {
    if (!(x % 2))
      sum += x;
    const long tmp = x;
    x += y;
    y = tmp;
  } while (x < under);
  return sum;
}

This function seems to me like it should run in logarithmic time with respect to the input (the number of loop iterations should be something like log_phi(under)).

However, when I time this function, it takes less time the higher the input is. Here is a full reproducible example:

#include <stdio.h>
#include <time.h>

static long a(long under) {
  long x = 0, y = 1;
  long sum = 0;
  do {
    if (!(x % 2))
      sum += x;
    const long tmp = x;
    x += y;
    y = tmp;
  } while (x < under);
  return sum;
}

static long time_a(long value) {
  struct timespec start, end;
  clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &start);
  a(value);
  clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &end);
  return end.tv_nsec - start.tv_nsec + (end.tv_sec - start.tv_sec) * 1000000000;
}

void main() {
  const long t1 = time_a(1000);
  const long t2 = time_a(1000000);
  const long t3 = time_a(1000000000);

  printf("time for       1000: %ld ns\n", t1);
  printf("time for    1000000: %ld ns\n", t2);
  printf("time for 1000000000: %ld ns\n", t3);
}

If I compile this with GCC and then run it, I get the following output:

time for       1000: 4483 ns
time for    1000000: 2564 ns
time for 1000000000: 2514 ns

What is going on here?

Edit: I am using GCC 11.2.1 with glibc 2.34, on Gentoo Linux with kernel 5.15.11. The code was compiled with gcc test.c -o test. Looking at the binary in objdump, GCC is not optimizing out the call to a when compiled this way. The code is being run on an HP ZBook 15u G4 with an Intel i5-7200U.

Upvotes: 2

Views: 202

Answers (1)

Employed Russian
Employed Russian

Reputation: 213385

The condition on line 8: if (!(x % 2)) is evaluated 17 times, 48 times and 93 times for the inputs 1000, 1000000 and 1000000000.

These are simply too few iterations -- your measurement overhead is larger than the time you are trying to measure.

To measure this reliably, invoke a() 1,000,000 times and measure the total time taken.

When I do that, and divide the time by 1e6, I get:

time for       1000:    49.81 ns
time for    1000000:    59.16 ns
time for 1000000000:    85.02 ns

which at least goes in the right direction.

P.S. Measuring performance of unoptimized code is a fool's errand: the compiler may introduce all kinds of artifacts.

Upvotes: 1

Related Questions