Joshua Waring
Joshua Waring

Reputation: 619

why the large difference, between for loops

I got curious with the difference between a for(;;) and for(:), particularly with the speed between the two. So I ran a little test by having a vector of 10 million integers and adding them all together in a for. I found that the for(:) was 1.3 slower.

What would cause the for(:) to be that much slower!?

EDIT: It seems like the for(:) uses the iterator of the vector unlike the for(;;) making it longer.

/Yu"stdafx.h" /GS /analyze- /W3 /Zc:wchar_t /ZI /Gm /Od /sdl /Fd"Debug\vc120.pdb" /fp:precise /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_LIB" /D "_UNICODE" /D "UNICODE" /errorReport:prompt /WX- /Zc:forScope /RTC1 /Gd /Oy- /MDd /Fa"Debug\" /EHsc /nologo /Fo"Debug\" /Fp"Debug\forvsForLoop.pch"

#include "stdafx.h"
#include <vector>
#include <iostream>
#include <chrono>

void init(std::vector<int> &array){
    srand(20);
    for (int x = 0; x < 10000000; x++)
        array.push_back(rand());
    return;
}

unsigned long testForLoop(std::vector<int> &array){
    unsigned long result = 0;
    for (int x = 0; x < array.size(); x++)
        result += array[x];
    return result;
}
unsigned long testFor(std::vector<int> &array){
    unsigned long result = 0;
    for (const int &element : array)
        result += element;
    return result;
}
int _tmain(int argc, _TCHAR* argv[])
{
    std::vector<int> testingArray;

    init(testingArray);

    //Warm up
    std::cout << "warming up \n";
    testForLoop(testingArray);
    testFor(testingArray);
    testForLoop(testingArray);
    testFor(testingArray);
    testForLoop(testingArray);
    testFor(testingArray);
    std::cout << "starting \n";

    auto start = std::chrono::high_resolution_clock::now();
    testForLoop(testingArray);
    auto end = std::chrono::high_resolution_clock::now();
    std::cout << "ForLoop took: " <<  std::chrono::duration_cast<std::chrono::nanoseconds>(end - start).count() << std::endl;


    start = std::chrono::high_resolution_clock::now();
    testFor(testingArray);
    end = std::chrono::high_resolution_clock::now();
    std::cout << "For---- took: " << std::chrono::duration_cast<std::chrono::nanoseconds>(end - start).count() << std::endl;

    system("pause");
    return 0;

}

Upvotes: 1

Views: 178

Answers (4)

Emilio Garavaglia
Emilio Garavaglia

Reputation: 20759

The answer is a guess, being subjective to the exact code and optimization used. Also the underlying platform can change how code behavior can work.

There are essentialkly two "low level" ways to manage an iteration: one is based on a "reassignable pointer", the other is based on a "constant pointer and an offset".

In pseudocode

loop { *a = *b; ++a; ++b; }

vs

loop { a[i] = b[i]; ++i; }

Depending on the processor architecture, the two are not the same having a different behavior respect to the use of registers, address locality and caches: the first has two sums with a memory-held constant, the second has two sums with a register, and a register increment. (and both have a memory copy)

On x86 platforms, the second is preferable, since does less memory access, and uses instructions requiring less memory fetches.

Now, an iterator based loop applied to a vector (whose iterators wrap pointers) lead to the first form, while a traditional index based loop leads to the second form.

Now for(a: v) { .... } is the same as for(auto i=v.begin(); i!=v.end(); ++i) { auto& a=*i; ... }

It works with whatever form of container (also not memory sequential) but cannot reduce to index based. Unless compiler optimization are so good to find out that the iterator is actually a pointer moving by constant increments.

Upvotes: 1

Loki Astari
Loki Astari

Reputation: 264669

To make sure that the test are not optimized out I printed out the result:

 auto x = testForLoop(......

 // ^^^
 ......nd - start).count() << "  R: " << x << std::endl;

                          //  ^^^^^^^^^^^^^^^^

Normal mode: (approx half speed)

> g++ -std=c++11 v.cpp
> ./a.out
warming up
starting
ForLoop took: 33262788  R: 10739647121123056
For---- took: 51263111   R: 10739647121123056

Optimized: (practically identical)

> g++ -O3 -std=c++11 v.cpp
> ./a.out
warming up
starting
ForLoop took: 4861314  R: 10739647121123056
For---- took: 4997957   R: 10739647121123056

Upvotes: 2

masoud
masoud

Reputation: 56539

The standard doesn't say anything about performance or implementation. Both loops should work properly and also the performance should be equal in normal situations. No one can say why it's too slow in MSVC++ unless he claim it's a bug or bad implementation. Maybe you should change the optimization settings correctly.

I've tested your code in MSVC++, GCC and Clang.

GCC output

ForLoop took: 7879773
For---- took: 5786831

Clang output

ForLoop took: 6537441
For---- took: 6743614

and MSVC++ output

ForLoop took: 77786200
For---- took: 249612200

Both GCC and Clang have reasonable results and two loops are close to each other as expected. But the MSVC++'s result is vague and unrealistic. I call it a bug or regression. Or, your bad configuration to compile, try another optimization settings.

Upvotes: 3

woolstar
woolstar

Reputation: 5083

If you are using:

for ( auto x : ... )

Then each x is a copy. Less overhead might be:

for ( const auto & x : ... )

Upvotes: 3

Related Questions