Reputation: 51283

HSL Image Adjustements on GPU

I have an application where the user should be able to modify an image with sliders for hue, saturation and lightness. All image processing is done on the GPU using GLSL fragment shaders.

My problem is that RGB -> HSL -> RGB conversions are rather expensive on the gpu due to the extensive branching.

My question is whether I can convert the users "color adjustments" to some other color space which can more efficiently compute the adjusted image on the GPU.

Upvotes: 8

Answers (5)

Spatial

Reputation: 1489

It's a mistake to assume that branching in the GPU and branching in code are the same thing.

For simple conditionals there's never any branching at all. GPUs have conditional move instructions that directly translate to ternary expressions and simple if-else statements.

Where things get problematic is when you have nested conditionals or multiple conditionally-dependent operations. Then you have to consider whether the GLSL compiler is smart enough to translate it all into cmoves. Whenever possible the compiler will emit code that executes all branches and recombine the result with conditional moves, but it can't always do that.

You've got to know when to help it. Never guess when you can measure - use AMD's GPU Shader Analyzer or Nvidia's GCG to view the assembly output. The instruction set of a GPU is very limited and simplistic so don't be scared of the word 'assembly.'

Here's a pair of RGB/HSL conversion functions which I've changed around so they play nicely with AMD's GLSL compiler, along with the assembly output. Credit goes to Paul Bourke for the original C conversion code.

// HSL range 0:1
vec4 convertRGBtoHSL( vec4 col )
{
    float red   = col.r;
    float green = col.g;
    float blue  = col.b;

    float minc  = min3( col.r, col.g, col.b );
    float maxc  = max3( col.r, col.g, col.b );
    float delta = maxc - minc;

    float lum = (minc + maxc) * 0.5;
    float sat = 0.0;
    float hue = 0.0;

    if (lum > 0.0 && lum < 1.0) {
        float mul = (lum < 0.5)  ?  (lum)  :  (1.0-lum);
        sat = delta / (mul * 2.0);
    }

    vec3 masks = vec3(
        (maxc == red   && maxc != green) ? 1.0 : 0.0,
        (maxc == green && maxc != blue)  ? 1.0 : 0.0,
        (maxc == blue  && maxc != red)   ? 1.0 : 0.0
    );

    vec3 adds = vec3(
              ((green - blue ) / delta),
        2.0 + ((blue  - red  ) / delta),
        4.0 + ((red   - green) / delta)
    );

    float deltaGtz = (delta > 0.0) ? 1.0 : 0.0;

    hue += dot( adds, masks );
    hue *= deltaGtz;
    hue /= 6.0;

    if (hue < 0.0)
        hue += 1.0;

    return vec4( hue, sat, lum, col.a );
}

Assembly output for this function:

 1  x: MIN         ____,    R0.y,   R0.z      
    y: ADD         R127.y, -R0.x,   R0.z      
    z: MAX         ____,    R0.y,   R0.z      
    w: ADD         R127.w,  R0.x,  -R0.y      
    t: ADD         R127.x,  R0.y,  -R0.z      
 2  y: MAX         R126.y,  R0.x,   PV1.z      
    w: MIN         R126.w,  R0.x,   PV1.x      
    t: MOV         R1.w,    R0.w      
 3  x: ADD         R125.x, -PV2.w,  PV2.y      
    y: SETE_DX10   ____,    R0.x,   PV2.y      
    z: SETNE_DX10  ____,    R0.y,   PV2.y      
    w: SETE_DX10   ____,    R0.y,   PV2.y      
    t: SETNE_DX10  ____,    R0.z,   PV2.y      
 4  x: CNDE_INT    R123.x,  PV3.y,  0.0f,   PV3.z      
    y: CNDE_INT    R125.y,  PV3.w,  0.0f,   PS3      
    z: SETNE_DX10  ____,    R0.x,   R126.y      
    w: SETE_DX10   ____,    R0.z,   R126.y      
    t: RCP_e       R125.w,  PV3.x      
 5  x: MUL_e       ____,    PS4,     R127.y      
    y: CNDE_INT    R123.y,  PV4.w,   0.0f,  PV4.z      
    z: ADD/2       R127.z,  R126.w,  R126.y      VEC_021 
    w: MUL_e       ____,    PS4,     R127.w      
    t: CNDE_INT    R126.x,  PV4.x,   0.0f,  1065353216      
 6  x: MUL_e       ____,    R127.x,  R125.w      
    y: CNDE_INT    R123.y,  R125.y,  0.0f,  1065353216      
    z: CNDE_INT    R123.z,  PV5.y,   0.0f,  1065353216      
    w: ADD         ____,    PV5.x,   (0x40000000, 2.0f).y      
    t: ADD         ____,    PV5.w,   (0x40800000, 4.0f).z      
 7  x: DOT4        ____,    R126.x,  PV6.x      
    y: DOT4        ____,    PV6.y,   PV6.w      
    z: DOT4        ____,    PV6.z,   PS6      
    w: DOT4        ____,    (0x80000000, -0.0f).x,  0.0f      
    t: SETGT_DX10  R125.w,  0.5,     R127.z      
 8  x: ADD         R126.x,  PV7.x,   0.0f      
    y: SETGT_DX10  ____,    R127.z,  0.0f      
    z: ADD         ____,   -R127.z,  1.0f      
    w: SETGT_DX10  ____,    R125.x,  0.0f      
    t: SETGT_DX10  ____,    1.0f,    R127.z      
 9  x: CNDE_INT    R127.x,  PV8.y,   0.0f,   PS8      
    y: CNDE_INT    R123.y,  R125.w,  PV8.z,  R127.z      
    z: CNDE_INT    R123.z,  PV8.w,   0.0f,   1065353216      
    t: MOV         R1.z,    R127.z      
10  x: MOV*2       ____,    PV9.y      
    w: MUL         ____,    PV9.z,   R126.x      
11  z: MUL_e       R127.z,  PV10.w,  (0x3E2AAAAB, 0.1666666716f).x      
    t: RCP_e       ____,    PV10.x      
12  x: ADD         ____,    PV11.z,  1.0f      
    y: SETGT_DX10  ____,    0.0f,    PV11.z      
    z: MUL_e       ____,    R125.x,  PS11      
13  x: CNDE_INT    R1.x,    PV12.y,  R127.z,  PV12.x      
    y: CNDE_INT    R1.y,    R127.x,  0.0f,    PV12.z

Notice that there are no branching instructions. It's conditional moves all the way, pretty much exactly as I wrote them.

The hardware needed for a conditional move is just a binary comparator (5 gates per bit) and a bunch of traces. Very fast.

Another fun thing to notice is that there's no divides. Instead the compiler used an approximate reciprocal and a multiply instruction. It does this for sqrt operations as well a lot of the time. You can pull the same tricks on a CPU with (for example) the SSE rcpps and rsqrtps instructions.

Now the reverse operation:

// HSL [0:1] to RGB [0:1]
vec4 convertHSLtoRGB( vec4 col )
{
    const float onethird = 1.0 / 3.0;
    const float twothird = 2.0 / 3.0;
    const float rcpsixth = 6.0;

    float hue = col.x;
    float sat = col.y;
    float lum = col.z;

    vec3 xt = vec3(
        rcpsixth * (hue - twothird),
        0.0,
        rcpsixth * (1.0 - hue)
    );

    if (hue < twothird) {
        xt.r = 0.0;
        xt.g = rcpsixth * (twothird - hue);
        xt.b = rcpsixth * (hue      - onethird);
    } 

    if (hue < onethird) {
        xt.r = rcpsixth * (onethird - hue);
        xt.g = rcpsixth * hue;
        xt.b = 0.0;
    }

    xt = min( xt, 1.0 );

    float sat2   =  2.0 * sat;
    float satinv =  1.0 - sat;
    float luminv =  1.0 - lum;
    float lum2m1 = (2.0 * lum) - 1.0;
    vec3  ct     = (sat2 * xt) + satinv;

    vec3 rgb;
    if (lum >= 0.5)
         rgb = (luminv * ct) + lum2m1;
    else rgb =  lum    * ct;

    return vec4( rgb, col.a );
}

(edited 05/July/2013: I made a mistake when translating this function orignally. The assembly has also been updated).

Assembly output:

1   x: ADD         ____,   -R2.x,  1.0f      
    y: ADD         ____,    R2.x,  (0xBF2AAAAB, -0.6666666865f).x      
    z: ADD         R0.z,   -R2.x,  (0x3F2AAAAB, 0.6666666865f).y      
    w: ADD         R0.w,    R2.x,  (0xBEAAAAAB, -0.3333333433f).z      
2   x: SETGT_DX10  R0.x,    (0x3F2AAAAB, 0.6666666865f).x,  R2.x      
    y: MUL         R0.y,    PV2.x,  (0x40C00000, 6.0f).y      
    z: MOV         R1.z,    0.0f      
    w: MUL         R1.w,    PV2.y,  (0x40C00000, 6.0f).y      
3   x: MUL         ____,    R0.w,  (0x40C00000, 6.0f).x      
    y: MUL         ____,    R0.z,  (0x40C00000, 6.0f).x      
    z: ADD         R0.z,   -R2.x,  (0x3EAAAAAB, 0.3333333433f).y      
    w: MOV         ____,    0.0f      
4   x: CNDE_INT    R0.x,    R0.x,   R0.y,  PV4.x      
    y: CNDE_INT    R0.y,    R0.x,   R1.z,  PV4.y      
    z: CNDE_INT    R1.z,    R0.x,   R1.w,  PV4.w      
    w: SETGT_DX10  R1.w,    (0x3EAAAAAB, 0.3333333433f).x,  R2.x      
5   x: MUL         ____,    R2.x,   (0x40C00000, 6.0f).x      
    y: MUL         ____,    R0.z,   (0x40C00000, 6.0f).x      
    z: ADD         R0.z,   -R2.y,   1.0f      
    w: MOV         ____,    0.0f      
6   x: CNDE_INT    R127.x,  R1.w,   R0.x,  PV6.w      
    y: CNDE_INT    R127.y,  R1.w,   R0.y,  PV6.x      
    z: CNDE_INT    R127.z,  R1.w,   R1.z,  PV6.y      
    w: ADD         R1.w,   -R2.z,   1.0f      
7   x: MULADD      R0.x,    R2.z,   (0x40000000, 2.0f).x, -1.0f      
    y: MIN*2       ____,    PV7.x,  1.0f      
    z: MIN*2       ____,    PV7.y,  1.0f      
    w: MIN*2       ____,    PV7.z,  1.0f      
8   x: MULADD      R1.x,    PV8.z,  R2.y,    R0.z      
    y: MULADD      R127.y,  PV8.w,  R2.y,    R0.z      
    z: SETGE_DX10  R1.z,    R2.z,            0.5      
    w: MULADD      R0.w,    PV8.y,  R2.y,    R0.z      
9   x: MULADD      R0.x,    R1.w,   PV9.x,   R0.x      
    y: MULADD      R0.y,    R1.w,   PV9.y,   R0.x      
    z: MUL         R0.z,    R2.z,   PV9.y      
    w: MULADD      R1.w,    R1.w,   PV9.w,   R0.x      
10  x: MUL         ____,    R2.z,   R0.w      
    y: MUL         ____,    R2.z,   R1.x      
    w: MOV         R2.w,    R2.w       
11  x: CNDE_INT    R2.x,    R1.z,   R0.z,    R0.y      
    y: CNDE_INT    R2.y,    R1.z,   PV11.y,  R0.x      
    z: CNDE_INT    R2.z,    R1.z,   PV11.x,  R1.w

Again no branches. Yum!

Upvotes: 16

Agnius Vasiliauskas

Reputation: 11277

I believe that conversion between RGB and HSV/HSL could be coded without branching at all. For example, how conversion RGB -> HSV without branching could look in GLSL:

vec3 RGBtoHSV( float r, float g, float b) {
   float minv, maxv, delta;
   vec3 res = vec3(0.0);

   minv = min(min(r, g), b);
   maxv = max(max(r, g), b);
   res.z = maxv;
   delta = maxv - minv;

   // branch1  maxv == 0.0
   float br1 = 1.0 - abs(sign(maxv));
   res.y = mix(delta / maxv, 0.0, br1); 
   res.x = mix(res.x, -1.0, br1);

   // branch2  r == maxv
   float br2 = abs(sign(r - maxv)); 
   float br2_or_br1 = max(br2,br1);
   res.x = mix(( g - b ) / delta, res.x, br2_or_br1);

   // branch3 g == maxv 
   float br3 = abs(sign(g - maxv));
   float br3_or_br1 = max(br3,br1);
   res.x = mix(2.0 + ( b - r ) / delta, res.x, br3_or_br1);

   // branch4 r != maxv && g != maxv 
   float br4 = 1.0 - br2*br3;
   float br4_or_br1 = max(br4,br1);
   res.x = mix(4.0 + ( r - g ) / delta, res.x, br4_or_br1);

   res.x = mix(res.x * 60.0, res.x, br1);

   // branch5 res.x < 0.0 
   float br5 = clamp(sign(res.x),-1.0,0.0) + 1.0;
   float br5_or_br1 = max(br5,br1);
   res.x = mix(res.x + 360.0, res.x, br5_or_br1);

   return res;
}

But I've not benchmarked this solution. It can be that some performance gain that we win without branching here can be compensated by performance losses of redundant code execution. So extensive testing is needed...

Upvotes: 2

jochem

Reputation: 51

I had the same question, but I found a very simple solution which suits my needs, perhaps its also useful to you. The Saturation of a color is basically it's spread, I believe that is the euclidean distance between the RGB values and their average. regardless of that, if you simply take the average of the maximum and minimum of the RGB values, and scale the colors relative to that pivot, the effect is a very decent increase (or decrease) in saturation.

in a glsl shader you would write:

float pivot=(min(min(color.x, color.y), color.z)+max(max(color.x, color.y), color.z))/2.0;
color.xyz -= vec3( pivot );
color.xyz *= saturationScale;
color.xyz += vec3( pivot );

Upvotes: 5

Virne

Reputation: 1245

For lightness and saturation you can use YUV (actually YCbCr). It's easy to convert from RGB and back. No branching needed. Saturation is controlled by increasing or decreasing both Cr and Cb. Lightness is Y.

You get something similar to HSL hue modification by rotating Cb and Cr components (it's practically a 3D vector), but of course it depends on your application if that's enough.

alt text

Edit: A color component (Cb,Cr) is a point in a color plane like above. If you take any random point and rotate it around the center, result is hue changing. But as mechanism is a bit different than in HSL, results are not precisely same.

Image is public domain from Wikipedia.

Upvotes: 10

rotoglup

Reputation: 5283

You could use a 3D Look Up Table to store the color transform, the table would be updated by the user variables, but there may be simpler approches.

More informations are available in GPU Gems 2.

Upvotes: 1

HSL Image Adjustements on GPU

Answers (5)

Related Questions