Reputation: 7371
I am currently working with some function like below:
void vadd(float * a, float * b, int n){
for(int i = 0; i < n; i++){
a[i] += b[i];
}
}
This loop essentially can be rewritten using SSE but my question is how to handle the few elements that are left out if n is not a multiple of 4?
Thanks a lot, Bob
Upvotes: 1
Views: 136
Reputation: 363487
You can handle the last n % 4
elements with a separate loop:
void vadd(float *a, float *b, int n)
{
int i = 0;
for (; i < n - n % 4; i += 4) {
a[i + 0] += b[i + 0];
a[i + 1] += b[i + 1];
a[i + 2] += b[i + 2];
a[i + 3] += b[i + 3];
}
for (; i < n; i++) {
a[i] += b[i];
}
}
Upvotes: 5