c++coptimizationbit-manipulationbranchless

Reputation: 1343

Constrain a 16 bit signed value between 0 and 4095 using Bit Manipulation only (without branching)

I want to constrain the value of a signed short variable between 0 and 4095, after which I take the most significant 8 bits as my final value for use elsewhere. Right now I'm doing it in a basic manner as below:

short color     = /* some external source */;
/* 
 * I get the color value as a 16 bit signed integer from an
 * external source I cannot trust. 16 bits are being used here
 * for higher precision.
 */

if ( color < 0 ) {
    color = 0;
}
else if ( color > 4095 ) {
    color = 4095;
}

unsigned char color8bit  = 0xFF & (color >> 4);
/*
 * color8bit is my final value which I would actually use
 * in my application.
 */

Is there any way this can be done using bit manipulation only, i.e. without using any conditionals? It might help quite a bit in speeding things up as this operation is happening thousands of time in the code.

The following won't help as it doesn't take care of edge cases such as negative values and overflows:

unsigned char color8bit = 0xFF & (( 0x0FFF & color ) >> 4 );

Edit: Adam Rosenfield's answer is the one which takes the correct approach but its incorrectly implemented. ouah's answer gives correct results but takes a different approach that what I originally intended to find out.

This is what I ended up using:

const static short min = 0;
const static short max = 4095;
color = min ^ (( min ^ color ) & -( min < color ));
color = max ^ (( color ^ max ) & -( color < max ));
unsigned char color8bit = 0xFF & (( 0x0FFF & color ) >> 4 );

Upvotes: 4

Answers (7)

Kirill Kobelev

Reputation: 10557

You can do the following:

BYTE data[0x10000] = { ..... };

BYTE byte_color = data[(unsiged short)short_color];

In your days 64kb table is not something outrageous and may be acceptable. The number of assembler commands in this variant of code will be absolute minimum compared to other possible approaches.

Upvotes: 5

dgnuff

Reputation: 3557

This is somewhat akin to Tom Seddon's answer, but uses a slightly cleaner way to do the clamp above. Note that both Mr. Seddon's answer and mine avoid the issue of ouah's answer that shifting a signed value to the right is implementation defined behavior, and hence not guaranteed to work on all architenctures.

#include <inttypes.h>
#include <iostream>

int16_t clamp(int16_t value)
{
    // clampBelow is 0xffff for -ve, 0x0000 for +ve
        int16_t const clampBelow = -static_cast<int16_t>(static_cast<uint16_t>(value) >> 15);

    // value is now clamped below at zero
    value &= ~clampBelow;
    // subtract 4095 so we can do the same trick again
    value -= 4095;
    // clampAbove is 0xffff for -ve, 0x0000 for +ve,
    // i.e. 0xffff for original value < 4095, 0x0000 for original >= 4096
        int16_t const clampAbove = -static_cast<int16_t>(static_cast<uint16_t>(value) >> 15);

    // adjusted value now clamped above at zero
    value &= clampAbove;
    // and restore to original value.
    value += 4095;
    return value;
}

void verify(int16_t value)
{
    int16_t const clamped = clamp(value);
    int16_t const check = (value < 0 ? 0 : value > 4095 ? 4095 : value);
    if (clamped != check)
    {
        std::cout << "Verification falure for value: " << value << ", clamped: " << clamped << ", check: " << check << std::endl;
    }
}

int main()
{
    for (int16_t i = 0x4000; i != 0x3fff; i++)
    {
        verify(i);
    }
    return 0;
}

That's a full test program (OK, so it doesn't test 0x3fff - sue me. ;) ) from which you can extract the clamp() routine for whatever you need.

I've also broken clamp out to "one step per line" for the sake of clarity. If your compiler has a half way decent optimizer, you can leave it as is and rely on the compiler to produce the best possible code. If your compiler's optimizer is not that great, then by all means, it can be reduced in line count, albeit at the cost of a little readability.

"Never sacrifice clarity for efficiency" -- Bob Buckley, comp sci professor, U-Warwick, Coventry, England, 1980.

Best piece of advice I ever got. ;)

Upvotes: 0

ouah

Reputation: 145839

short color = /* ... */
color =   ((((!!(color >> 12)) * 0xFFF)) | (!(color >> 12) * color ))
        & (!(color >> 15) * 0xFFF);

unsigned char color8bit  = 0xFF & (color >> 4);

It assumes two's complement representation.

This has the advantage of not using any equality or relational operators. There are situations you want to avoid branches at all costs: in some security applications you don't want the attackers to perform branch predictions. Without branches (in embedded processors particularly) you can make your function run in constant time for all inputs.

Note that: x * 0xFFF can be further reduced to (x << 12) - x. Also the multiplication in (!(color >> 12) * color ) can also be further optimized as the left operand of * here is 0 or 1.

EDIT:

I add a little explanation: the expression above simply does the same as below without the use of the conditional and relational operators:

y =   ((y > 4095 ? 4095 : 0) | (y > 4095 ? 0 : y))
    & (y < 0 ? 0 : 4095);

EDIT2:

as @HotLicks correctly noted in his comment, the ! is still a conceptual branch. Nevertheless it can also be computed with bitwise operators. For example !!a can be done with the trivial:

b = (a >> 15 | a >> 14 | ... | a >> 1 | a) & 1

and !a can be done as b ^ 1. And I'm sure there is a nice hack to do it more effectively.

Upvotes: 2

Mark Ransom

Reputation: 308206

I'm going to leave an answer even though it doesn't directly answer the original question, because in the end I think you'll find it much more useful.

I'm assuming that your color is coming from a camera or image scanner running at 12 bits, followed by some undetermined processing step that might create values beyond the 0 to 4095 range. If that's the case the values are almost certainly derived in a linear fashion. The problem is that displays are gamma corrected, so the conversion from 12 bit to 8 bit will require a non-linear gamma function rather than a simple right shift. This will be much slower than the clamping operation your question is trying to optimize. If you don't use a gamma function the image will appear too dark.

short color     = /* some external source */;
unsigned char color8bit;
if (color <= 0)
    color8bit = 0;
else if (color >= 4095)
    color8bit = 255;
else
    color8bit = (unsigned char)(255.99 * pow(color / 4095.0, 1/2.2));

At this point you might consider a lookup table as suggested by Kirill Kobelev.

Upvotes: 0

Ben Jackson

Reputation: 93760

You could also easily vectorize this using Intel's SSE intrinsics. One 128-bit register would hold 8 of your short and there are functions to min/max/shift/mask all of them in parallel. In a loop the constants for min/max can be preloaded into a register. The pshufb instruction (part of SSSE3) will even pack the bytes for you.

Upvotes: 1

Tom Seddon

Reputation: 2748

I assume a short is 16 bits.

Remove negative values:

int16_t mask=-(int16_t)((uint16_t)color>>15);//0xFFFF if +ve, 0 if -ve
short value=color&mask;//0 if -ve, colour if +ve

value is now between 0 and 32767 inclusive.

You can then do something similar to clamp the value:

mask=(uint16_t)(value-4096)>>15;//1 if <=4095, 0 if >4095
--mask;//0 if <=4095, 0xFFFF if >4095
mask&=0xFFF;//0 if <=4095, 4095 if >4095

value|=mask;//4095 if >4095, color if <4095

Upvotes: 2

Adam Rosenfield

Reputation: 400294

Yes, see these bit-twiddling hacks:

short color = ...;
color = color ^ (color & -(color < 0));  // color = max(color, 0)
color = 4096 ^ ((color ^ 4096) & -(color < 4096));  // color = min(color, 4096)

unsigned char color8bit  = 0xFF & (color >> 4);

Whether this actually turns out to be faster, I don't know -- you should profile. Most modern x86 and x86-64 chips these days support "conditional move" instructions (cmov) which conditionally store a value depending on the EFLAGS status bits, and optimizing compilers will often produce these instructions from ternary expressions like color >= 0 ? color : 0. Those will likely be fastest, but they won't run on older x86 chips.

Upvotes: 7

Constrain a 16 bit signed value between 0 and 4095 using Bit Manipulation only (without branching)

Answers (7)

Related Questions