Fastest way to multiply and add to an int array in C by an int?

Question

I'm learning a bit about pthreads and performance when programming in C and I'd like to know what the best way is to add a single int to all elements in an array (quite a large one with around 5000 elements) as well as the best way to multiply an int to all elements in an array.

I tried doing this in parallel however there wasn't much of an improvement. My parallel method was to pass a struct containing the value to add/multiply by as well as a pointer to the array. I passed this as argument into pthread_create. In the function that was called, I added/multiplied the given value to all the elements in the array.

I feel as though there is a better way to multiply or add a single number to all 5000 elements (or more) in the array. I've also heard of those SIMD commands. Would that possibly help in this situation?

Paul R · Accepted Answer

If you can assume an x86 CPU then you can use Intel's SSE SIMD extensions to process 4 elements at a time.

E.g. to add a value to all elements of an array:

#include "emmintrin.h"

// ...

const __m128i vinc = _mm_set1_epi32(inc);  // init vector containing value to add
for (int i = 0; i < N; i += 4)
{
    __m128i v = _mm_loadu_si128(&a[i]);    // load 4 elements from array a
    v = _mm_add_epi32(v, vinc);            // add increment to each element
    _mm_storeu_si128(&a[i], v);            // save 4 modified elements back to a
}

On newer CPUs, e.g. Haswell, you can use AVX2 to process 8 elements per iteration, in a similar fashion.

Note that some compilers will already vectorize this code for you, e.g. gcc, clang, ICC, and even recent versions of Visual Studio (on a good day), so you may not even need to code this explicitly with SSE intrinsics.

There are also optimised libraries that will perform operations such as this for you, e.g. Intel's IPP, or Apple's Accelerate framework, and many other open source libraries.

And the usual caveat about premature optimisation applies of course: you should benchmark your existing code first, and be certain that it is a performance bottleneck, before attempting to optimise it.

Fastest way to multiply and add to an int array in C by an int?

Answers (2)

Related Questions