Reputation: 124
I am trying to write the neon level SIMD for below scalar code :
Scalar code :
int *xt = new int[50];
float32_t input1[16] = {12.0f,12.0f,12.0f,12.0f,12.0f,12.0f,12.0f,12.0f,12.0f,12.0f,12.0f,12.0f,12.0f,12.0f,12.0f,12.0f,};
float32_t input2[16] = {13.0f,12.0f,9.0f,12.0f,12.0f,12.0f,12.0f,12.0f,13.0f,12.0f,9.0f,12.0f,12.0f,12.0f,12.0f,12.0f};
float32_t threshq = 13.0f;
uint32_t corners_count = 0;
float32_t threshq =13.0f;
for (uint32_t x = 0; x < 16; x++)
{
if ( (input1[x] == input2[x]) && (input2[x] > threshq) )
{
xt[corners_count] = x ;
}
}
Neon:
float32x4_t t1,t2,t3;
uint32x4_t rq1,rq2,rq3;
t1 = vld1q_f32(input1); // 12 12 12 12
t2 = vld1q_f32(input2); // 13 12 09 12
t3 = vdupq_n_f32(threshq); // 13 13 13 13
rq1 = vceqq_f32(t1,t2); // condition to check for input1 equal to input2
rq2 = vcgtq_f32(t1,t3); // condition to check for input1 greater than to threshold
rq3 = vandq_u32(rq1,rq2); // anding the result of two conditions
for( int i = 0;i < 4; i++){
corners_count = corners_count + rq3[i];
//...Not able to write a logic in neon for the same
}
I am not able to write a logic in Neon . Can anyone really guide me for the same .I am totally got tired in thinking about this logic
Upvotes: 0
Views: 1025
Reputation: 213120
Because of the dependencies in your loop I think you need to re-factor your code into a SIMD loop followed by a scalar loop. Pseudo code:
// SIMD loop
for each set of 4 float elements
apply SIMD threshold test
store 4 x bool results in temp[]
// scalar loop
for each bool element in temp[]
if temp[x]
xt[corners_count] = x
corner_count++
This way you get the benefits of SIMD for most of the operations, and you just have to resort to scalar code for the last part.
Upvotes: 1