Reputation: 1997
In Matlab we use vectorization to speed up code. For example, here are two ways of performing the same calculation:
% Loop
tic
i = 0;
for t = 0:.01:1e5
i = i + 1;
y(i) = sin(t);
end
toc
% Vectorization
tic
t = 0:.01:1e5;
y = sin(t);
toc
The results are:
Elapsed time is 1.278207 seconds. % For loop
Elapsed time is 0.099234 seconds. % Vectorization
So the vectorized code is almost 13 times faster. Actually, if we run it again we get:
Elapsed time is 0.200800 seconds. % For loop
Elapsed time is 0.103183 seconds. % Vectorization
The vectorized code is now only 2 times as fast instead of 13 times as fast. So it appears we get a huge speedup on the first run of the code, but on future runs the speedup is not as great since Matlab appears to know that the for loop hasn't changed and is optimizing for it. In any case the vectorized code is still twice as fast as the for loop code.
Now I have started using C++ and I am wondering about vectorization in this language. Do we need to vectorize for loops in C++ or are they already fast enough? Maybe the compiler automatically vectorizes them? Actually, I don't know if Matlab type vectorization is even a concept in C++, maybe its just needed for Matlab because this is an interpreted language? How would you write the above function in C++ to make it as efficient as possible?
Upvotes: 5
Views: 3541
Reputation: 238341
Do we need vectorization in C++
Vectorisation is not necessarily needed always, but it can make some programs faster.
C++ compilers support auto-vectorisation, although if you need to have vectorisation, then you might not be able to rely on such optimisation because not every loop can be vectorised automatically.
are [loops] already fast enough?
Depends on the loop, the target CPU, the compiler and its options, and crucially: How fast does it need to be.
Some things that you could do to potentially achieve vectorisation in standard C++:
std::parallel_unsequenced_policy
or std::unsequenced_policy
.alignas
. See the manual of the target CPU for what alignment you need.for (int i = 0; i < count; i += 4) { operation(i + 0); operation(i + 1); operation(i + 2); operation(i + 3); }
Outside of standard, portable C++, there are implementation specific ways:
using v4si = int __attribute__ ((vector_size (16))); v4si a, b, c; a = b + 1; /* a = b + {1,1,1,1}; */ a = 2 * b; /* a = {2,2,2,2} * b; */
#pragma omp simd
.Upvotes: 5