bfalz
bfalz

Reputation: 92

ARM NEON my calculation result when there are negative numbers is incorrect

I am trying to calculate the following using neon in assembly ((200*(53-255))/255) + 255 whose result should equal approx 97

I've tested here http://szeged.github.io/nevada/ and also on a dual-core Cortex-A7 ARM CPU tablet. And the result is 243 which is not correct.
How should I implement this to get the correct result of 97?

d2 contains 200,200,200,200,200,200,200,200
d4 contains 255,255,255,255,255,255,255,255
d6 contains 53,53,53,53,53,53,53,53

vsub.s8 d8, d6, d4  (53 - 255 results in d8 = 54,54,54,54,54,54,54,54)
vmull.s8 q5,d8,d2  (54 * 200 results in q5 = 244,48,244,48,244,48,244,48,244,48,244,48,244,48,244,48)
vshrn.s16 d12, q5, #8 (divide by 255 results in d12 = 244,244,244,244,244,244,244,244) 
vadd.s8 d5, d4, d12  (final result d5 = 243,243,243,243,243,243,243,243) 

Upvotes: 0

Views: 361

Answers (1)

243 is absolutely correct.

The alpha channel is an unsigned 8bit value, you should use u8 or u16 instead of s8 and s16.

While for standard arithmetics where the bit width remains the same the sign doesn't matter, it's a completely different story for multiply long.

And that's the reason there are two separate instructions for ARM UMULL and SMULL for long multiply while a single MUL instruction will do for 32bit both signed and unsigned multiplications.

54*200 is simply impossible since 200 is interpreted as -56 in a signed multiply.

=>
54*-56 = -3024
-3024/256 = -12
-12 + -1 = -13    // 255 = -1
-13 = 243

You actually have to change vmull.s8 to vmull.u8 :

=>
54*200 = 4800
2800/256 = 18
18 + -1 = 17

Honestly, I don't know how you are expecting a result of 97 with the ops above : how is it supposed to be some kind of alpha blending as one of the tags is implying?

Further, >>8 is NOT /255. It's just a bad approximation. You might think you can live with a precision that low, but it's FAR from sufficient when alpha blending.

You must be doing something wrong.

Upvotes: 1

Related Questions