kjgregory
kjgregory

Reputation: 686

Efficient way to split up RGB values in C

I'm writing some software for a 32-bit cortex M0 microcontroller in C and I'm doing alot of manipulations with 32-bit RGB values. They are handled in a 32-bit integer format like 0x00BBRRGG. I want to be able to do math with them without worrying about carry bits spilling between the colors, so I need to split them up into three uint8 values. Is there an efficient way of doing this? I'm assuming the inefficient way would be as follows:

blue = (RGB >> 16) & 0xFF;
green = (RGB >> 8) & 0xFF;
red = RGB & 0xFF;

//do math

new_RGB = (blue << 16) | (green << 8) | red;

Also, I have a couple of interfaces and one of them uses the format 0x00RRGGBB and the other uses 0x00BBRRGG. Is there an efficient way to convert between the two?

Upvotes: 2

Views: 5551

Answers (5)

Nominal Animal
Nominal Animal

Reputation: 39308

I want to be able to do math with them without worrying about carry bits spilling between the colors, so I need to split them up into three uint8 values.

No, usually you do not need to (split them into three uint8 values). Consider this function:

uint32_t blend(const uint32_t argb0, const uint32_t argb1, const int phase)
{
    if (phase <= 0)
        return argb0;
    else
    if (phase < 256) {
        const uint32_t rb0 = argb0 & 0x00FF00FF;
        const uint32_t rb1 = argb1 & 0x00FF00FF;
        const uint32_t ag0 = (argb0 >> 8) & 0x00FF00FF;
        const uint32_t ag1 = (argb1 >> 8) & 0x00FF00FF;
        const uint32_t rb = rb1 * phase + (256 - phase) * rb0;
        const uint32_t ag = ag1 * phase + (256 - phase) * ag0;
        return ((rb & 0xFF00FF00u) >> 8)
             |  (ag & 0xFF00FF00u);
    } else
        return argb1;
}

This function implements a linear blend from color argb0 (phase <= 0) to argb1 (phase >= 256), by splitting each input vector (with four 8-bit components) into two vectors with two 16-bit components.

If you don't need the alpha channel, then it may be more efficient to work on pairs of color values (say, for each pair of pixels) -- so (0xRRGGBB, 0xrrggbb) is split into (0x00RR00BB, 0x00rr00bb, 0x00GG00gg) -- which in the above blend function means one less multiplication (but one more AND and one OR operation).

The 32-bit multiplication operation on Cortex-M0 devices varies between implementations. Some have a single-cycle multiplication operation, on others it takes 32 cycles. So, depending on the exact Cortex-M0 core used, replacing one multiplication with an AND and an OR may be a big speedup, or a slight slowdown.

 
When you actually do need the separate components, then leaving the splitting to the compiler often leads to better code generated: instead of specifying the color, pass a pointer to the color value,

uint32_t  some_op(const uint32_t *const argb)
{
    const uint32_t  a = ((const uint8_t *)argb)[0];
    const uint32_t  r = ((const uint8_t *)argb)[1];
    const uint32_t  g = ((const uint8_t *)argb)[2];
    const uint32_t  b = ((const uint8_t *)argb)[3];

    /* Do something ... */

}

This is because many architectures have instructions that load an 8-bit value into a full register, setting all higher bits to zero (zero extend, uxtb on Cortex-M0 architecture; the C compiler will do this for you). Marking both the pointer and the pointed to value, as well as the intermediate values, const, should allow the compiler to optimize the access so that it happens at the best moment/position in the generated code, rather than having to keep it in a register. (This is especially true on architectures with few (available) registers, like 32-bit and 64-bit Intel and AMD architectures (x86 and x86-64). Cortex-M0 has 12 general-purpose 32-bit registers, but it depends on the ABI used which ones are "free" to use in a function.)

 
Note that if you are using GCC to compile your code, you can use

uint32_t oabc_to_ocba(uint32_t c)
{
    asm volatile ( "rev %0, %0\n\t"
                 : "=r" (c)
                 : "r" (c)
                 );
    return c >> 8;
}

to convert 0x0ABC to 0x0CBA and vice versa. Normally, it compiles to rev r0, r0, lsrs r0, r0, #8, bx lr, but the compiler can inline it and use another register instead (of r0).

Upvotes: 2

Lundin
Lundin

Reputation: 213513

Your "inefficient" way probably just boils down to a few lines of machine code and shifts are fast - meaning that the shift version will execute incredibly fast and micro optimizations like that shouldn't be a concern in 99% of all applications.

Addressing the individual bytes through pointers/arrays is not necessarily a performance improvement. It might very well be the opposite - check the generated assembly. If you would use a struct/union solution, it should be for the sake of readability and not for micro-managing performance.

However, the shift version is superior when it comes to portability. When bit shifting, you don't have to worry about endianess, padding, alignment, pointer aliasing - all of which could be issues with a struct/union solution.

The root of the problem is actually the 32 bit integer representation. If you can get rid of that, it will solve a lot of problems. The ideal format here would be uint8_t color[3];.

Upvotes: 0

bd2357
bd2357

Reputation: 794

It is not portable, but since you are on an M0 and probable in little endian mode. Use bit fields or a union of uint32_t and array of uint8_t.

typedef struct {
    uint32_t red: 8;
    uint32_t green: 8;
    uint32_t blue: 8;
    uint32_t spare: 8;
} rgb_s;

static rgb_s var; // statics init to zero
var.red = 0x56
var.green = 0x34
var.blue = 0x12

uint32_t myInt = *(uint32_t*)&var;  // myInt is now 0x00123456;

use static or make sure the spare field is zeroed out if it is important.

or for unions

enum {Red, Green, Blue, Colors};

typedef union {
    uint32_t rgb;
    uint8_t color[Colors];
} rgb_u;

rgb_u var;
var.rgb = 0x0;
var.color[red] = 0x56;
var.color[green] = 0x34;
var.color[blue] = 0x12;

assert(var.rgb == 0x123456); //the uint32 overlays the array

Again, neither is really portable but both are common in embedded. You need to know the endian for your processor. (M0 can work big or little but the default is little) There are also anonymous unions is C now but not all embedded compilers support them.

Upvotes: 0

Jeroen3
Jeroen3

Reputation: 935

To convert 0x00RRGGBB to 0x00BBRRGG you can use the endian converter:

REV    r0,r0     ;0x00RRGGBB -> 0xBBGGRR00
LSRS   r0,r0,#8  ;0xBBGGRR00 -> 0x00BBGGRR

An efficient way to do this could be by writing an assembly function loading the maximum amount of data in free registers, performing the conversion on all registers, and writing them back.
Use the ARM procedure call standard as reference on how to write an assembly function called from C.

Another way is by simply performing byte copies, but this requires 3-4* read/writes, where above only requires 2 per pixel.

*3 if don't care xxRRGGBB, 4 if 00RRGGBB.

Upvotes: 1

user1118321
user1118321

Reputation: 26345

If you use a struct you don't need to do any bit shifting operations. I don't know whether this will be efficient with your particular processor, but just making something simple like:

typedef struct xRGBPixel {
    unsigned char unused;
    unsigned char red;
    unsigned char green;
    unsigned char blue;
} xRGBPixel;

You can have a similar structure for the BRG pixels. (Are you sure it's BRG and not BGR? That's seriously weird and unconventional.)

If that's not as efficient, then Jonathan Leffler's suggestion in the comments about a union of a 32-bit int and an array of 4 unsigned char values may be a better fit. Something like this:

typedef union Pixel {
    uint32_t pixelAsInt;
    unsigned char pixelAsChar[4];
} Pixel;

Upvotes: 2

Related Questions