Reputation: 8430
I'm looking to calculate highly parallelized trig functions (in block of like 1024), and I'd like to take advantage of at least some of the parallelism that modern architectures have.
When I compile a block
for(int i=0; i<SIZE; i++) {
arr[i]=sin((float)i/1024);
}
GCC won't vectorize it, and says
not vectorized: relevant stmt not supported: D.3068_39 = __builtin_sinf (D.3069_38);
Which makes sense to me. However, I'm wondering if there's a library to do parallel trig computations.
With just a simple taylor series up the 11th order, GCC will vectorize all the loops, and I'm getting speeds over twice as fast as a naive sin loop (with bit-exact answers, or with 9th order series, only a single bit off for the last two out of 1600 values, for a >3x speedup). I'm sure someone has encountered a problem like this before, but when I google, I find no mentions of any libraries or the like.
A. Is there something existing already?
B. If not, advice for optimizing parallel trig functions?
EDIT: I found the following library called "SLEEF": http://shibatch.sourceforge.net/ which is described in this paper and uses SIMD instructions to calculate several elementary functions. It uses SSE and AVX specific code, but I don't think it will be hard to turn it into standard C loops.
Upvotes: 8
Views: 4445
Reputation: 61
You can check this one-header lib that provides AVX/Neon trigonometry "intrinsics", easy to integrate : https://github.com/Geolm/math_intrinsics
Upvotes: 0
Reputation: 8430
My answer was to create my own library to do exactly this called vectrig: https://github.com/jeremysalwen/vectrig
Upvotes: 0
Reputation: 215221
Instead of the taylor series, I would look at the algorithms fdlibm uses. They should get you as much precision with fewer steps.
Upvotes: 1
Reputation: 3059
Since you said you were using GCC it looks like there are some options:
That said, I'd probably look into GPGPU for a solution. Maybe writing it in CUDA or OpenCL (If I remember correctly CUDA supports the sine function). Here are some libraries that look like they might make it easier.
Upvotes: 4
Reputation: 2667
Since you are looking to calculate harmonics here, I have some code that addressed a similar problem. It is vectorized already and faster than anything else I have found. As a side benefit, you get the cosine for free.
Upvotes: 2
Reputation: 106137
What platform are you using? Many libraries of this sort already exist:
Upvotes: 1