Reputation: 782
for (int i = 0; i < someValue; i += 4) {
__m64 mmxValue;
if (i + 3 < someValue) {
mmxValue = _mm_set_pi16(_buffer[i], _buffer[i + 1], _buffer[i + 2], _buffer[i + 3]);
// add and use result
} else if (i + 2 < someValue) {
mmxValue = _mm_set_pi16(_buffer[i], _buffer[i + 1], _buffer[i + 2], 0);
// add and use result
} else if (i + 1 < someValue) {
mmxValue = _mm_set_pi16(_buffer[i], _buffer[i + 1], 0, 0);
// add and use result
} else {
mmxValue = _mm_set_pi16(_buffer[i], 0, 0, 0);
// add and use result
}
}
I'm trying to set up mmxValue with up to 4 16-bit signed values that I would then use for an addition in each of the conditions.
I'm wondering if it's possible to re-write this somehow that uses fewer conditions (or none at all) in a way that would be more optimal.
The conditions exist because for values of i >= someValue the index for _buffer is out of range.
Upvotes: 0
Views: 67
Reputation: 114579
A faster loop would be moving by 4 until you get to the last block
int i = 0;
while (i <= somevalue-4) {
mmxValue = _mm_set_pi16(_buffer[i],
_buffer[i+1],
_buffer[i+2],
_buffer[i+3]);
... use the the result ...
i += 4;
}
... handle only last block with conditionals ...
Even better of course would be if possible enlarging _buffer
variable so that it has room for the extra zeros needed for padding.
Upvotes: 2