Reputation: 92
I am trying to calculate the following using neon in assembly ((200*(53-255))/255) + 255 whose result should equal approx 97
I've tested here http://szeged.github.io/nevada/ and also on a dual-core Cortex-A7 ARM CPU tablet. And the result is 243 which is not correct.
How should I implement this to get the correct result of 97?
d2 contains 200,200,200,200,200,200,200,200
d4 contains 255,255,255,255,255,255,255,255
d6 contains 53,53,53,53,53,53,53,53
vsub.s8 d8, d6, d4 (53 - 255 results in d8 = 54,54,54,54,54,54,54,54)
vmull.s8 q5,d8,d2 (54 * 200 results in q5 = 244,48,244,48,244,48,244,48,244,48,244,48,244,48,244,48)
vshrn.s16 d12, q5, #8 (divide by 255 results in d12 = 244,244,244,244,244,244,244,244)
vadd.s8 d5, d4, d12 (final result d5 = 243,243,243,243,243,243,243,243)
Upvotes: 0
Views: 361
Reputation: 6354
243 is absolutely correct.
The alpha channel is an unsigned 8bit value, you should use u8
or u16
instead of s8
and s16
.
While for standard arithmetics where the bit width remains the same the sign doesn't matter, it's a completely different story for multiply long.
And that's the reason there are two separate instructions for ARM UMULL
and SMULL
for long multiply while a single MUL
instruction will do for 32bit both signed and unsigned multiplications.
54*200 is simply impossible since 200 is interpreted as -56 in a signed multiply.
=>
54*-56 = -3024
-3024/256 = -12
-12 + -1 = -13 // 255 = -1
-13 = 243
You actually have to change vmull.s8
to vmull.u8
:
=>
54*200 = 4800
2800/256 = 18
18 + -1 = 17
Honestly, I don't know how you are expecting a result of 97 with the ops above : how is it supposed to be some kind of alpha blending as one of the tags is implying?
Further, >>8
is NOT /255
. It's just a bad approximation. You might think you can live with a precision that low, but it's FAR from sufficient when alpha blending.
You must be doing something wrong.
Upvotes: 1