Reputation: 15
As silly as it seems, I would like to know whether there may be pitfalls when trying to reconcile the time costs for a for
loop, as measured
for
loop (global or external time cost)The example below illustrates my difficulties getting two equal measurements:
#include <iostream>
#include <vector> // std::vector
#include <ctime> // clock(), ..
int main(){
clock_t clockStartLoop;
double timeInternal(0)// the time cost of the loop, summing all time costs of commands within the "for" loop
, timeExternal // time cost of the loop, as measured outside the boundaries of "for" loop
;
std::vector<int> vecInt; // will be [0,1,..,10000] after the loop below
clock_t costExternal(clock());
for(int i=0;i<10000;i++){
clockStartLoop = clock();
vecInt.push_back(i);
timeInternal += clock() - clockStartLoop; // incrementing internal time cost
}
timeInternal /= CLOCKS_PER_SEC;
timeExternal = (clock() - costExternal)/(double)CLOCKS_PER_SEC;
std::cout << "timeExternal = "<< timeExternal << " s ";
std::cout << "vs timeInternal = " << timeInternal << std::endl;
std::cout << "We have a ratio of " << timeExternal/timeInternal << " between the two.." << std::endl;
}
I usually get a ratio around 2 as output e.g.
timeExternal = 0.008407 s vs timeInternal = 0.004287 We have a ratio of 1.96105 between the two..
, whereas I was hoping a ratio closer to 1.
timeInternal
) ? i++
operation in the for(..)
be non-negligible in the external measurement and also explain the difference with the internal one ?I'm actually dealing with a more complex code and I would like to isolate time costs within a loop, being sure that all the time slices I consider do make up a complete pie (which I never achieved until now..). Thanks a lot
Upvotes: 0
Views: 56
Reputation: 85531
timeExternal = 0.008407 s vs timeInternal = 0.004287 We have a ratio of 1.96105 between the two..
A ratio of ~2 is to be expected - by far the heaviest call in your loop is clock()
itself (on most systems clock()
is a syscall to the kernel).
Imagine that clock()
implementation looks like the following pseudocode:
clock_t clock() {
go_to_kernel(); // very long operation
clock_t rc = query_process_clock();
return_from_kernel(); // very long operation
return rc;
}
Now going back to the loop, we can annotate the places where time is spent:
for(int i=0;i<10000;i++){
// go_to_kernel - very long operation
clockStartLoop = clock();
// return_from_kernel - very long operation
vecInt.push_back(i);
// go_to_kernel - very long operation
timeInternal += clock() - clockStartLoop;
// return_from_kernel - very long operation
}
So between the two calls to clock()
we have 2 long operations, with a total in the loop of 4. Hence the ratio of 2-to-1.
- Is it just because there are operations internal to the loop which are not measured by the
clock()
difference (such as incrementingtimeInternal
) ?
No, incrementing timeInterval
is negligible.
- Could the
i++
operation in thefor(..)
be non-negligible in the external measurement and also explain the difference with the internal one ?
No, i++
is also negligible. Remove the inner calls to clock()
and you will see a much faster execution time. On my system it was 0.00003 s.
The next most expensive operation after clock()
is vector::push_back()
, because it needs to resize the vector
. This is amortized by a quadratic growth factor and can be eliminated entirely by calling vector::reserve()
before entering the loop.
Conclusion: when benchmarking, make sure to time entire loops, not individual iterations. Better yet, use frameworks like Google Benchmark, which will help to avoid many other pitfalls (like compiler optimizations). There's also quick-bench.com for simple cases (based on Google Benchmark).
Upvotes: 1