jduriez
jduriez

Reputation: 15

Global time cost versus sum of local time costs -- "for" loop

As silly as it seems, I would like to know whether there may be pitfalls when trying to reconcile the time costs for a for loop, as measured

The example below illustrates my difficulties getting two equal measurements:

#include <iostream>
#include <vector> // std::vector
#include <ctime> // clock(), ..

int main(){
  clock_t clockStartLoop;
  double timeInternal(0)// the time cost of the loop, summing all time costs of commands within the "for" loop
    , timeExternal // time cost of the loop, as measured outside the boundaries of "for" loop
    ;
  std::vector<int> vecInt; // will be [0,1,..,10000] after the loop below
  clock_t costExternal(clock());
  for(int i=0;i<10000;i++){
    clockStartLoop = clock();
    vecInt.push_back(i);
    timeInternal += clock() - clockStartLoop; // incrementing internal time cost
  }
  timeInternal /= CLOCKS_PER_SEC;
  timeExternal = (clock() - costExternal)/(double)CLOCKS_PER_SEC;

  std::cout << "timeExternal = "<< timeExternal << " s ";
  std::cout << "vs timeInternal = " << timeInternal << std::endl;
  std::cout << "We have a ratio of " << timeExternal/timeInternal << " between the two.." << std::endl;
}

I usually get a ratio around 2 as output e.g.

timeExternal = 0.008407 s vs timeInternal = 0.004287 We have a ratio of 1.96105 between the two..

, whereas I was hoping a ratio closer to 1.

I'm actually dealing with a more complex code and I would like to isolate time costs within a loop, being sure that all the time slices I consider do make up a complete pie (which I never achieved until now..). Thanks a lot

Upvotes: 0

Views: 56

Answers (1)

rustyx
rustyx

Reputation: 85531

timeExternal = 0.008407 s vs timeInternal = 0.004287 We have a ratio of 1.96105 between the two..

A ratio of ~2 is to be expected - by far the heaviest call in your loop is clock() itself (on most systems clock() is a syscall to the kernel).

Imagine that clock() implementation looks like the following pseudocode:

clock_t clock() {
    go_to_kernel(); // very long operation
    clock_t rc = query_process_clock();
    return_from_kernel(); // very long operation
    return rc;
}

Now going back to the loop, we can annotate the places where time is spent:

  for(int i=0;i<10000;i++){
    // go_to_kernel - very long operation
    clockStartLoop = clock();
    // return_from_kernel - very long operation
    vecInt.push_back(i);
    // go_to_kernel - very long operation
    timeInternal += clock() - clockStartLoop;
    // return_from_kernel - very long operation
  }

So between the two calls to clock() we have 2 long operations, with a total in the loop of 4. Hence the ratio of 2-to-1.

  • Is it just because there are operations internal to the loop which are not measured by the clock() difference (such as incrementing timeInternal) ?

No, incrementing timeInterval is negligible.

  • Could the i++ operation in the for(..) be non-negligible in the external measurement and also explain the difference with the internal one ?

No, i++ is also negligible. Remove the inner calls to clock() and you will see a much faster execution time. On my system it was 0.00003 s.

The next most expensive operation after clock() is vector::push_back(), because it needs to resize the vector. This is amortized by a quadratic growth factor and can be eliminated entirely by calling vector::reserve() before entering the loop.

Conclusion: when benchmarking, make sure to time entire loops, not individual iterations. Better yet, use frameworks like Google Benchmark, which will help to avoid many other pitfalls (like compiler optimizations). There's also quick-bench.com for simple cases (based on Google Benchmark).

Upvotes: 1

Related Questions