Reputation: 619
I got curious with the difference between a for(;;)
and for(:)
,
particularly with the speed between the two. So I ran a little test by having a vector of 10 million integers and adding them all together in a for. I found that the for(:)
was 1.3 slower.
What would cause the for(:)
to be that much slower!?
EDIT: It seems like the for(:) uses the iterator of the vector unlike the for(;;) making it longer.
/Yu"stdafx.h" /GS /analyze- /W3 /Zc:wchar_t /ZI /Gm /Od /sdl /Fd"Debug\vc120.pdb" /fp:precise /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_LIB" /D "_UNICODE" /D "UNICODE" /errorReport:prompt /WX- /Zc:forScope /RTC1 /Gd /Oy- /MDd /Fa"Debug\" /EHsc /nologo /Fo"Debug\" /Fp"Debug\forvsForLoop.pch"
#include "stdafx.h"
#include <vector>
#include <iostream>
#include <chrono>
void init(std::vector<int> &array){
srand(20);
for (int x = 0; x < 10000000; x++)
array.push_back(rand());
return;
}
unsigned long testForLoop(std::vector<int> &array){
unsigned long result = 0;
for (int x = 0; x < array.size(); x++)
result += array[x];
return result;
}
unsigned long testFor(std::vector<int> &array){
unsigned long result = 0;
for (const int &element : array)
result += element;
return result;
}
int _tmain(int argc, _TCHAR* argv[])
{
std::vector<int> testingArray;
init(testingArray);
//Warm up
std::cout << "warming up \n";
testForLoop(testingArray);
testFor(testingArray);
testForLoop(testingArray);
testFor(testingArray);
testForLoop(testingArray);
testFor(testingArray);
std::cout << "starting \n";
auto start = std::chrono::high_resolution_clock::now();
testForLoop(testingArray);
auto end = std::chrono::high_resolution_clock::now();
std::cout << "ForLoop took: " << std::chrono::duration_cast<std::chrono::nanoseconds>(end - start).count() << std::endl;
start = std::chrono::high_resolution_clock::now();
testFor(testingArray);
end = std::chrono::high_resolution_clock::now();
std::cout << "For---- took: " << std::chrono::duration_cast<std::chrono::nanoseconds>(end - start).count() << std::endl;
system("pause");
return 0;
}
Upvotes: 1
Views: 178
Reputation: 20759
The answer is a guess, being subjective to the exact code and optimization used. Also the underlying platform can change how code behavior can work.
There are essentialkly two "low level" ways to manage an iteration: one is based on a "reassignable pointer", the other is based on a "constant pointer and an offset".
In pseudocode
loop { *a = *b; ++a; ++b; }
vs
loop { a[i] = b[i]; ++i; }
Depending on the processor architecture, the two are not the same having a different behavior respect to the use of registers, address locality and caches: the first has two sums with a memory-held constant, the second has two sums with a register, and a register increment. (and both have a memory copy)
On x86 platforms, the second is preferable, since does less memory access, and uses instructions requiring less memory fetches.
Now, an iterator based loop applied to a vector (whose iterators wrap pointers) lead to the first form, while a traditional index based loop leads to the second form.
Now for(a: v) { .... }
is the same as for(auto i=v.begin(); i!=v.end(); ++i) { auto& a=*i; ... }
It works with whatever form of container (also not memory sequential) but cannot reduce to index based. Unless compiler optimization are so good to find out that the iterator is actually a pointer moving by constant increments.
Upvotes: 1
Reputation: 264669
To make sure that the test are not optimized out I printed out the result:
auto x = testForLoop(......
// ^^^
......nd - start).count() << " R: " << x << std::endl;
// ^^^^^^^^^^^^^^^^
Normal mode: (approx half speed)
> g++ -std=c++11 v.cpp
> ./a.out
warming up
starting
ForLoop took: 33262788 R: 10739647121123056
For---- took: 51263111 R: 10739647121123056
Optimized: (practically identical)
> g++ -O3 -std=c++11 v.cpp
> ./a.out
warming up
starting
ForLoop took: 4861314 R: 10739647121123056
For---- took: 4997957 R: 10739647121123056
Upvotes: 2
Reputation: 56539
The standard doesn't say anything about performance or implementation. Both loops should work properly and also the performance should be equal in normal situations. No one can say why it's too slow in MSVC++ unless he claim it's a bug or bad implementation. Maybe you should change the optimization settings correctly.
I've tested your code in MSVC++, GCC and Clang.
GCC output
ForLoop took: 7879773
For---- took: 5786831
Clang output
ForLoop took: 6537441
For---- took: 6743614
and MSVC++ output
ForLoop took: 77786200
For---- took: 249612200
Both GCC and Clang have reasonable results and two loops are close to each other as expected. But the MSVC++'s result is vague and unrealistic. I call it a bug or regression. Or, your bad configuration to compile, try another optimization settings.
Upvotes: 3
Reputation: 5083
If you are using:
for ( auto x : ... )
Then each x is a copy. Less overhead might be:
for ( const auto & x : ... )
Upvotes: 3