Reputation: 147
This is my naive implementation of dot product:
float simple_dot(int N, float *A, float *B) {
float dot = 0;
for(int i = 0; i < N; ++i) {
dot += A[i] * B[i];
}
return dot;
}
And this is using the C++ library:
float library_dot(int N, float *A, float *B) {
return std::inner_product(A, A+N, B, 0);
}
I ran some benchmark(code is here https://github.com/ijklr/sse), and the library version is a lot slower.
My compiler flag is -Ofast -march=native
Upvotes: 3
Views: 1602
Reputation: 476940
Your two functions don't do the same thing. The algorithm uses an accumulator whose type is deduced from the initial value, which in your case (0
) is int
. Accumulating floats into an int does not just take longer than accumulating into a float, but also produces a different result.
The equivalent of your raw loop code is to use the initial value 0.0f
, or equivalently float{}
.
(Note that std::accumulate
is a very similar in this regard.)
Upvotes: 8