Reputation: 388
I have a buffer of 12-bit data (stored in 16-bit data) and need to converts into 8-bit (shift by 4)
How can the NEON accelerate this processing ?
Thank you for your help
Brahim
Upvotes: 0
Views: 483
Reputation: 4941
Took the liberty to assume a few things explained below, but this kind of code (untested, may require a few modifications) should provide a good speedup compared to naive non-NEON version:
#include <arm_neon.h>
#include <stdint.h>
void convert(const restrict *uint16_t input, // the buffer to convert
restrict *uint8_t output, // the buffer in which to store result
int sz) { // their (common) size
/* Assuming the buffer size is a multiple of 8 */
for (int i = 0; i < sz; i += 8) {
// Load a vector of 8 16-bit values:
uint16x8_t v = vld1q_u16(buf+i);
// Shift it by 4 to the right, narrowing it to 8 bit values.
uint8x8_t shifted = vshrn_n_u16(v, 4);
// Store it in output buffer
vst1_u8(output+i, shifted);
}
}
Things I assumed here:
uint*
-> int*
, *_u8
->*_s8
and *_u16
->*_s16
)Finally, the 2 resource pages used from the NEON documentation:
Hope this helps!
Upvotes: 3
Reputation: 6354
prototype : void dataConvert(void * pDst, void * pSrc, unsigned int count);
1:
vld1.16 {q8-q9}, [r1]!
vld1.16 {q10-q11}, [r1]!
vqrshrn.u16 d16, q8, #4
vqrshrn.u16 d17, q9, #4
vqrshrn.u16 d18, q10, #4
vqrshrn.u16 d19, q11, #4
vst1.16 {q8-q9}, [r0]!
subs r2, #32
bgt 1b
q flag : saturation
r flag : rounding
change u16 to s16 in case of signed data.
Upvotes: 1