NEON Fixed point coding and Fixed vs Floating point operations performance comparison

Question

As we can see here "arm integer NEON operations cycles " and arm float NEON operations cycles ,the integer Multiply operations does not seem to have a definite advantage over the Floating point Multiplication operations. When I converted my floating point code to fixed point, I had to add additional "shift "instruction after fixed point multiplication/division instructions. The cycles required for the program actually increased due to increase in the instructions. The performance of my program deteriorated due to Fixed point. (14000 -cycles for floating point code, 26000-cycles for fixed point code).

Are there any special instructions dedicated NEON to fixed point operations(Multiplications and divisions) ? I only found one instruction that just converts Fixed -float and otherwise. Is there any efficient way of writing fixed point programs in NEON?

I wrote the following sample code for floating point code.

    VMUL   Q14.F32,Q8.F32,Q2.F32
    VMUL   Q15.F32,Q8.F32,Q3.F32
    VLD2    {Q10.F32,Q11.F32},[pTw2@256],TwdStep
    VLD2    {Q4.F32,Q5.F32},[pT1@256],fftSize
    VMLA   Q14.F32,Q9.F32,Q3.F32
    VMLS   Q15.F32,Q9.F32,Q2.F32

The following code was converted to Fixed point code by inserting shift operations after VMUL A instructions.

    VMUL   Q14.S32,Q8.S32,Q2.S32
   VMUL   Q15.S32,Q8.S32,Q3.S32
   VLD2    {Q10.S32,Q11.S32},[pTw2@256],TwdStep
   VLD2    {Q4.S32,Q5.S32},[pT1@256],fftSize
   VMLA   Q14.S32,Q9.S32,Q3.S32
   VMLS   Q15.S32,Q9.S32,Q2.S32

   VRSHR    Q14.S32,Q14.S32,#12     ;Shift instructions to account for fixed point 
   VRSHR    Q15.S32,Q15.S32,#12     ;

NEON Fixed point coding and Fixed vs Floating point operations performance comparison

Answers (1)

Related Questions