Reputation: 43
I'm trying to do an FFT
->signal manipulation
->Inverse FFT
using Project NE10 in my CPP project and convert the complex output to amplitudes and phases for FFT and vice versa for IFFT. But the performance of my C++ code is not as good as the SIMD enabled NE10 code as per the benchmarks. Since I have no experience with arm assembly, I'm looking for some help to write neon code for the unoptimised C module. For example, before IFFT I do this:
for (int bin = 0; bin < NUM_FREQUENCY_BINS; bin++) {
input[bin].real = amplitudes[bin] * cosf(phases[bin]);
input[bin].imag = amplitudes[bin] * sinf(phases[bin]);
}
where input
is an array of C structs (for complex values), amplitudes
& phases
are float
arrays.
The above block (O(n) complexity)
takes about 0.6ms for 8192 bins while NE10 FFT (O(n*log(n)) complexity)
takes only 0.1ms because of SIMD operations. From what I've read so far on StackOverflow and other places, intrinsics are not worth the effort, so I'm trying in arm neon only.
Upvotes: 4
Views: 293
Reputation: 4384
You can use NEON for trig functions if you settle for approximations. I am not affiliated, but there is an implementation here that uses intrinsics to create vectorised sin/cos functions accurate to many decimal places that perform substantially better than simply calling sinf
, etc (benchmarks are provided by the author).
The code is especially well suited to your polar to cartesian calculation, as it generates sin and cos results simultaneously. It might not be suitable for something where absolute precision is crucial, but for anything to do with frequency domain audio processing, this normally is not the case.
Upvotes: 1
Reputation: 4038
As I know NEON doesn't support vector operations for geometric functions (sin, cos). But of course you can improve your code. As variant you can use the table of pre-calculated values of functions sinus and cosine. It can lead to significant improvement of performance.
Concerning to using of intrinsics for NEON. I have tried to use both of them, but in most case they give practically the same result (for modern compiler). But using if assembler is more labor-intensive. The main performance improvement is given by the correct manipulation with data (loading, storing) and using of vector instructions but these actions can be performed with using of intrinsics .
Of course if you want to achieve 100% utilization of CPU you sometimes need to use assembler. But it is rare case.
Upvotes: 0