Reputation: 372
I know that vectorization can only take place if the objects being accessed are contiguous in memory. I have created a struct which has pointer and then I create a vector of this struct and to ensure that the pointers inside the vector of objects are pointing to contiguous data blocks I set them to point to elements in a vector of double with the same size.
#include <iostream>
#include <vector>
struct Vec {
Vec() {}
double* a;
};
int main(int argc, char* argv[]) {
std::vector<double> vec_double(10000000, 1.0);
std::vector<Vec> vec_vec(10000000);
for (unsigned i = 0; i < 10000000; ++i)
vec_vec[i].a = &(vec_double[i]);
// Why is this loop not vectorized
for (unsigned i = 0; i < 10000000; ++i)
vec_double[i] += *(vec_vec[i].a);
double sum = 0.0;
for (unsigned i = 0; i < 10000000; ++i)
sum += vec_double[i];
std::cout << sum << std::endl;
return 0;
}
However, even with O3 optimization the loop at line number 16 is not getting vectorized. Can someone please explain why is this happening?
Upvotes: 0
Views: 1924
Reputation: 1684
First of all, it's not true that "vectorization can only take place if the objects being accessed are contiguous in memory". It's quite possible to vectorize loops with non-contiguous memory accesses, and it's even hardware-level supported on newer CPUs like Haswell, where you have v(p)gather* instructions supporting "random memory access pattern" codes vectorization.
However, secondly, vectorizing codes with non-contiguous access is often very inefficient. Contiguous (also called "unit-stride" in this context) memory access is mostly always prefferable to have in vector code (although contiguous access patterns could be worse from memory bandwidth perspective, but it's separate long topic). So your optimization technique was appropriate.
Now, third, access pattern is just one of tens and hundreds of other reasons why compiler didn't vectorize the loop automatically or even didn't vectorize it after nudging it using some pragmas/etc. In your given case, there could be 2 reasons why gcc was confused by given code:
Both issues are usually well solvable using restrict keyword (not your case I guess), #pragma ivdep or #pragma omp simd. The thing is however, that pragmas ivdep and simd are only "reliably" supported starting from GCC version 4.9.
One another way is to rewrite your code so that it doesn't deal with pointers and maybe even vectors, but just use some C style array of fixed size. Of course it's terrible recommendations for real C++ codes, but in your given code snippet it's doable, since you statically specify array size.
Upvotes: 0
Reputation: 93
You should get the raw pointer from the iterator first. Like this vec_vec.begin()_Ptr
and vec_double.begin()._Ptr
. Work with those pointers like in legacy. And finaly declare method that it does no aliasing. Like this __declspec(noalias)
. And it should do the trick on windows with msvc. I don`t think there is a noalias attribute on GCC though.
Upvotes: 0
Reputation: 28050
Just guessing here, but when just looking at this specific loop, the compiler does not know that vec_vec[i].a
points to the memory location next to vec_vec[i+1].a
. Therefore, it cannot do the calculation without dereferencing each .a
member separately.
It could know that, when looking at the loop above. But if it would do that, it could also look at the loop below, calculate the final result and print it.
Upvotes: 3