birarduh
birarduh

Reputation: 782

How can I optimize the arithmetic and conditionals in this for loop?

for (int i = 0; i < someValue; i += 4) {
  __m64 mmxValue;

  if (i + 3 < someValue) {
      mmxValue = _mm_set_pi16(_buffer[i], _buffer[i + 1], _buffer[i + 2], _buffer[i + 3]);
      // add and use result
  } else if (i + 2 < someValue) {
      mmxValue = _mm_set_pi16(_buffer[i], _buffer[i + 1], _buffer[i + 2], 0);
      // add and use result
  } else if (i + 1 < someValue) {
      mmxValue = _mm_set_pi16(_buffer[i], _buffer[i + 1], 0, 0);
      // add and use result
  } else {
      mmxValue = _mm_set_pi16(_buffer[i], 0, 0, 0);
      // add and use result
  }
}

I'm trying to set up mmxValue with up to 4 16-bit signed values that I would then use for an addition in each of the conditions.

I'm wondering if it's possible to re-write this somehow that uses fewer conditions (or none at all) in a way that would be more optimal.

The conditions exist because for values of i >= someValue the index for _buffer is out of range.

Upvotes: 0

Views: 67

Answers (1)

6502
6502

Reputation: 114579

A faster loop would be moving by 4 until you get to the last block

int i = 0;
while (i <= somevalue-4) {
    mmxValue = _mm_set_pi16(_buffer[i],
                            _buffer[i+1],
                            _buffer[i+2],
                            _buffer[i+3]);
    ... use the the result ...
    i += 4;
}
... handle only last block with conditionals ...

Even better of course would be if possible enlarging _buffer variable so that it has room for the extra zeros needed for padding.

Upvotes: 2

Related Questions