Reputation: 3
I'm learning a bit about pthreads and performance when programming in C and I'd like to know what the best way is to add a single int
to all elements in an array (quite a large one with around 5000 elements) as well as the best way to multiply an int
to all elements in an array.
I tried doing this in parallel however there wasn't much of an improvement. My parallel method was to pass a struct
containing the value to add/multiply by as well as a pointer to the array. I passed this as argument into pthread_create
. In the function that was called, I added/multiplied the given value to all the elements in the array.
I feel as though there is a better way to multiply or add a single number to all 5000 elements (or more) in the array. I've also heard of those SIMD commands. Would that possibly help in this situation?
Upvotes: 0
Views: 453
Reputation: 212979
If you can assume an x86 CPU then you can use Intel's SSE SIMD extensions to process 4 elements at a time.
E.g. to add a value to all elements of an array:
#include "emmintrin.h"
// ...
const __m128i vinc = _mm_set1_epi32(inc); // init vector containing value to add
for (int i = 0; i < N; i += 4)
{
__m128i v = _mm_loadu_si128(&a[i]); // load 4 elements from array a
v = _mm_add_epi32(v, vinc); // add increment to each element
_mm_storeu_si128(&a[i], v); // save 4 modified elements back to a
}
On newer CPUs, e.g. Haswell, you can use AVX2 to process 8 elements per iteration, in a similar fashion.
Note that some compilers will already vectorize this code for you, e.g. gcc, clang, ICC, and even recent versions of Visual Studio (on a good day), so you may not even need to code this explicitly with SSE intrinsics.
There are also optimised libraries that will perform operations such as this for you, e.g. Intel's IPP, or Apple's Accelerate framework, and many other open source libraries.
And the usual caveat about premature optimisation applies of course: you should benchmark your existing code first, and be certain that it is a performance bottleneck, before attempting to optimise it.
Upvotes: 5
Reputation: 4859
You have to split the array using a divide and conquer approach. Create multiple threads and give each thread a part of the array. So create 5 threads and give the threads the elements 0..999 1000..1999 ...
You havt to use multiple threads to solve your problem, otherwise there will be no performance gain.
On a sidenote: The array is in my opinion too small to show any improvement over for the multi-threaded implementation over the straight forward one.
Upvotes: 0