Reputation: 51
I have tried to apply #pragma omp simd
to the following code (loops) but it does not seem to work (no speed improvement). I also tried #pragma omp simd linear
but all my attempts resulted in a seg fault.
https://github.com/Rdatatable/data.table/blob/master/src/fsort.c#L209 https://github.com/Rdatatable/data.table/blob/master/src/fsort.c#L184
Is it even possible to increment a vector with simd
? Example:
#include <stdio.h>
#include <stdlib.h>
int main() {
int len = 1000;
int tmp[len];
for(int i=0; i<len; ++i) {
tmp[i]=rand()%100;
}
int *thisCounts = (int *) calloc(len, sizeof(int));
for (int j=0; j<len; ++j) {
thisCounts[tmp[j]]++;
}
for (int j=0; j<len; ++j) {
printf("%d, ",thisCounts[j]);
}
free(thisCounts);
return 0;
}
FYI, line 209 is the one that takes most time and I am trying to improve.
Thank you
Upvotes: 0
Views: 313
Reputation: 50816
It depends of the target hardware architecture. Many processor architectures does not have SIMD instruction performing such kind of indirect accesses. On mainstream x86-64 processors, there is a scatter/gather instruction to perform such a computation. However, they are not efficiently implemented and thus not significantly faster than using non-SIMD instructions. Moreover, using them is difficult here since there is possibly some increment conflicts (if tmp[j1] == tmp[j2] with j1 != j2
. The AVX-512 SIMD instruction set contains interesting instructions for that but it is only available on few recent processors. The same apply for ARM with SVE/SVE2 which is very new and not yet available on the vast majority of ARM processors.
Thus, put it shortly, there is very slight chance your processor can possibly do that using SIMD instructions, but it does not means it is not possible on all architecture. Note also that using #pragma omp simd
is likely not correct here because of possible conflicts. Note also that the speed of this operation is likely dependent of the input data on a lot of modern processors (random data do not behave like most real-world possible inputs).
Upvotes: 2