Reputation: 51283
I have an application where the user should be able to modify an image with sliders for hue, saturation and lightness. All image processing is done on the GPU using GLSL fragment shaders.
My problem is that RGB -> HSL -> RGB conversions are rather expensive on the gpu due to the extensive branching.
My question is whether I can convert the users "color adjustments" to some other color space which can more efficiently compute the adjusted image on the GPU.
Upvotes: 8
Views: 4971
Reputation: 1489
It's a mistake to assume that branching in the GPU and branching in code are the same thing.
For simple conditionals there's never any branching at all. GPUs have conditional move instructions that directly translate to ternary expressions and simple if-else statements.
Where things get problematic is when you have nested conditionals or multiple conditionally-dependent operations. Then you have to consider whether the GLSL compiler is smart enough to translate it all into cmoves. Whenever possible the compiler will emit code that executes all branches and recombine the result with conditional moves, but it can't always do that.
You've got to know when to help it. Never guess when you can measure - use AMD's GPU Shader Analyzer or Nvidia's GCG to view the assembly output. The instruction set of a GPU is very limited and simplistic so don't be scared of the word 'assembly.'
Here's a pair of RGB/HSL conversion functions which I've changed around so they play nicely with AMD's GLSL compiler, along with the assembly output. Credit goes to Paul Bourke for the original C conversion code.
// HSL range 0:1
vec4 convertRGBtoHSL( vec4 col )
{
float red = col.r;
float green = col.g;
float blue = col.b;
float minc = min3( col.r, col.g, col.b );
float maxc = max3( col.r, col.g, col.b );
float delta = maxc - minc;
float lum = (minc + maxc) * 0.5;
float sat = 0.0;
float hue = 0.0;
if (lum > 0.0 && lum < 1.0) {
float mul = (lum < 0.5) ? (lum) : (1.0-lum);
sat = delta / (mul * 2.0);
}
vec3 masks = vec3(
(maxc == red && maxc != green) ? 1.0 : 0.0,
(maxc == green && maxc != blue) ? 1.0 : 0.0,
(maxc == blue && maxc != red) ? 1.0 : 0.0
);
vec3 adds = vec3(
((green - blue ) / delta),
2.0 + ((blue - red ) / delta),
4.0 + ((red - green) / delta)
);
float deltaGtz = (delta > 0.0) ? 1.0 : 0.0;
hue += dot( adds, masks );
hue *= deltaGtz;
hue /= 6.0;
if (hue < 0.0)
hue += 1.0;
return vec4( hue, sat, lum, col.a );
}
Assembly output for this function:
1 x: MIN ____, R0.y, R0.z
y: ADD R127.y, -R0.x, R0.z
z: MAX ____, R0.y, R0.z
w: ADD R127.w, R0.x, -R0.y
t: ADD R127.x, R0.y, -R0.z
2 y: MAX R126.y, R0.x, PV1.z
w: MIN R126.w, R0.x, PV1.x
t: MOV R1.w, R0.w
3 x: ADD R125.x, -PV2.w, PV2.y
y: SETE_DX10 ____, R0.x, PV2.y
z: SETNE_DX10 ____, R0.y, PV2.y
w: SETE_DX10 ____, R0.y, PV2.y
t: SETNE_DX10 ____, R0.z, PV2.y
4 x: CNDE_INT R123.x, PV3.y, 0.0f, PV3.z
y: CNDE_INT R125.y, PV3.w, 0.0f, PS3
z: SETNE_DX10 ____, R0.x, R126.y
w: SETE_DX10 ____, R0.z, R126.y
t: RCP_e R125.w, PV3.x
5 x: MUL_e ____, PS4, R127.y
y: CNDE_INT R123.y, PV4.w, 0.0f, PV4.z
z: ADD/2 R127.z, R126.w, R126.y VEC_021
w: MUL_e ____, PS4, R127.w
t: CNDE_INT R126.x, PV4.x, 0.0f, 1065353216
6 x: MUL_e ____, R127.x, R125.w
y: CNDE_INT R123.y, R125.y, 0.0f, 1065353216
z: CNDE_INT R123.z, PV5.y, 0.0f, 1065353216
w: ADD ____, PV5.x, (0x40000000, 2.0f).y
t: ADD ____, PV5.w, (0x40800000, 4.0f).z
7 x: DOT4 ____, R126.x, PV6.x
y: DOT4 ____, PV6.y, PV6.w
z: DOT4 ____, PV6.z, PS6
w: DOT4 ____, (0x80000000, -0.0f).x, 0.0f
t: SETGT_DX10 R125.w, 0.5, R127.z
8 x: ADD R126.x, PV7.x, 0.0f
y: SETGT_DX10 ____, R127.z, 0.0f
z: ADD ____, -R127.z, 1.0f
w: SETGT_DX10 ____, R125.x, 0.0f
t: SETGT_DX10 ____, 1.0f, R127.z
9 x: CNDE_INT R127.x, PV8.y, 0.0f, PS8
y: CNDE_INT R123.y, R125.w, PV8.z, R127.z
z: CNDE_INT R123.z, PV8.w, 0.0f, 1065353216
t: MOV R1.z, R127.z
10 x: MOV*2 ____, PV9.y
w: MUL ____, PV9.z, R126.x
11 z: MUL_e R127.z, PV10.w, (0x3E2AAAAB, 0.1666666716f).x
t: RCP_e ____, PV10.x
12 x: ADD ____, PV11.z, 1.0f
y: SETGT_DX10 ____, 0.0f, PV11.z
z: MUL_e ____, R125.x, PS11
13 x: CNDE_INT R1.x, PV12.y, R127.z, PV12.x
y: CNDE_INT R1.y, R127.x, 0.0f, PV12.z
Notice that there are no branching instructions. It's conditional moves all the way, pretty much exactly as I wrote them.
The hardware needed for a conditional move is just a binary comparator (5 gates per bit) and a bunch of traces. Very fast.
Another fun thing to notice is that there's no divides. Instead the compiler used an approximate reciprocal and a multiply instruction. It does this for sqrt operations as well a lot of the time. You can pull the same tricks on a CPU with (for example) the SSE rcpps and rsqrtps instructions.
Now the reverse operation:
// HSL [0:1] to RGB [0:1]
vec4 convertHSLtoRGB( vec4 col )
{
const float onethird = 1.0 / 3.0;
const float twothird = 2.0 / 3.0;
const float rcpsixth = 6.0;
float hue = col.x;
float sat = col.y;
float lum = col.z;
vec3 xt = vec3(
rcpsixth * (hue - twothird),
0.0,
rcpsixth * (1.0 - hue)
);
if (hue < twothird) {
xt.r = 0.0;
xt.g = rcpsixth * (twothird - hue);
xt.b = rcpsixth * (hue - onethird);
}
if (hue < onethird) {
xt.r = rcpsixth * (onethird - hue);
xt.g = rcpsixth * hue;
xt.b = 0.0;
}
xt = min( xt, 1.0 );
float sat2 = 2.0 * sat;
float satinv = 1.0 - sat;
float luminv = 1.0 - lum;
float lum2m1 = (2.0 * lum) - 1.0;
vec3 ct = (sat2 * xt) + satinv;
vec3 rgb;
if (lum >= 0.5)
rgb = (luminv * ct) + lum2m1;
else rgb = lum * ct;
return vec4( rgb, col.a );
}
(edited 05/July/2013: I made a mistake when translating this function orignally. The assembly has also been updated).
Assembly output:
1 x: ADD ____, -R2.x, 1.0f
y: ADD ____, R2.x, (0xBF2AAAAB, -0.6666666865f).x
z: ADD R0.z, -R2.x, (0x3F2AAAAB, 0.6666666865f).y
w: ADD R0.w, R2.x, (0xBEAAAAAB, -0.3333333433f).z
2 x: SETGT_DX10 R0.x, (0x3F2AAAAB, 0.6666666865f).x, R2.x
y: MUL R0.y, PV2.x, (0x40C00000, 6.0f).y
z: MOV R1.z, 0.0f
w: MUL R1.w, PV2.y, (0x40C00000, 6.0f).y
3 x: MUL ____, R0.w, (0x40C00000, 6.0f).x
y: MUL ____, R0.z, (0x40C00000, 6.0f).x
z: ADD R0.z, -R2.x, (0x3EAAAAAB, 0.3333333433f).y
w: MOV ____, 0.0f
4 x: CNDE_INT R0.x, R0.x, R0.y, PV4.x
y: CNDE_INT R0.y, R0.x, R1.z, PV4.y
z: CNDE_INT R1.z, R0.x, R1.w, PV4.w
w: SETGT_DX10 R1.w, (0x3EAAAAAB, 0.3333333433f).x, R2.x
5 x: MUL ____, R2.x, (0x40C00000, 6.0f).x
y: MUL ____, R0.z, (0x40C00000, 6.0f).x
z: ADD R0.z, -R2.y, 1.0f
w: MOV ____, 0.0f
6 x: CNDE_INT R127.x, R1.w, R0.x, PV6.w
y: CNDE_INT R127.y, R1.w, R0.y, PV6.x
z: CNDE_INT R127.z, R1.w, R1.z, PV6.y
w: ADD R1.w, -R2.z, 1.0f
7 x: MULADD R0.x, R2.z, (0x40000000, 2.0f).x, -1.0f
y: MIN*2 ____, PV7.x, 1.0f
z: MIN*2 ____, PV7.y, 1.0f
w: MIN*2 ____, PV7.z, 1.0f
8 x: MULADD R1.x, PV8.z, R2.y, R0.z
y: MULADD R127.y, PV8.w, R2.y, R0.z
z: SETGE_DX10 R1.z, R2.z, 0.5
w: MULADD R0.w, PV8.y, R2.y, R0.z
9 x: MULADD R0.x, R1.w, PV9.x, R0.x
y: MULADD R0.y, R1.w, PV9.y, R0.x
z: MUL R0.z, R2.z, PV9.y
w: MULADD R1.w, R1.w, PV9.w, R0.x
10 x: MUL ____, R2.z, R0.w
y: MUL ____, R2.z, R1.x
w: MOV R2.w, R2.w
11 x: CNDE_INT R2.x, R1.z, R0.z, R0.y
y: CNDE_INT R2.y, R1.z, PV11.y, R0.x
z: CNDE_INT R2.z, R1.z, PV11.x, R1.w
Again no branches. Yum!
Upvotes: 16
Reputation: 11277
I believe that conversion between RGB and HSV/HSL could be coded without branching at all. For example, how conversion RGB -> HSV without branching could look in GLSL:
vec3 RGBtoHSV( float r, float g, float b) {
float minv, maxv, delta;
vec3 res = vec3(0.0);
minv = min(min(r, g), b);
maxv = max(max(r, g), b);
res.z = maxv;
delta = maxv - minv;
// branch1 maxv == 0.0
float br1 = 1.0 - abs(sign(maxv));
res.y = mix(delta / maxv, 0.0, br1);
res.x = mix(res.x, -1.0, br1);
// branch2 r == maxv
float br2 = abs(sign(r - maxv));
float br2_or_br1 = max(br2,br1);
res.x = mix(( g - b ) / delta, res.x, br2_or_br1);
// branch3 g == maxv
float br3 = abs(sign(g - maxv));
float br3_or_br1 = max(br3,br1);
res.x = mix(2.0 + ( b - r ) / delta, res.x, br3_or_br1);
// branch4 r != maxv && g != maxv
float br4 = 1.0 - br2*br3;
float br4_or_br1 = max(br4,br1);
res.x = mix(4.0 + ( r - g ) / delta, res.x, br4_or_br1);
res.x = mix(res.x * 60.0, res.x, br1);
// branch5 res.x < 0.0
float br5 = clamp(sign(res.x),-1.0,0.0) + 1.0;
float br5_or_br1 = max(br5,br1);
res.x = mix(res.x + 360.0, res.x, br5_or_br1);
return res;
}
But I've not benchmarked this solution. It can be that some performance gain that we win without branching here can be compensated by performance losses of redundant code execution. So extensive testing is needed...
Upvotes: 2
Reputation: 51
I had the same question, but I found a very simple solution which suits my needs, perhaps its also useful to you. The Saturation of a color is basically it's spread, I believe that is the euclidean distance between the RGB values and their average. regardless of that, if you simply take the average of the maximum and minimum of the RGB values, and scale the colors relative to that pivot, the effect is a very decent increase (or decrease) in saturation.
in a glsl shader you would write:
float pivot=(min(min(color.x, color.y), color.z)+max(max(color.x, color.y), color.z))/2.0;
color.xyz -= vec3( pivot );
color.xyz *= saturationScale;
color.xyz += vec3( pivot );
Upvotes: 5
Reputation: 1245
For lightness and saturation you can use YUV (actually YCbCr). It's easy to convert from RGB and back. No branching needed. Saturation is controlled by increasing or decreasing both Cr and Cb. Lightness is Y.
You get something similar to HSL hue modification by rotating Cb and Cr components (it's practically a 3D vector), but of course it depends on your application if that's enough.
Edit: A color component (Cb,Cr) is a point in a color plane like above. If you take any random point and rotate it around the center, result is hue changing. But as mechanism is a bit different than in HSL, results are not precisely same.
Image is public domain from Wikipedia.
Upvotes: 10
Reputation: 5283
You could use a 3D Look Up Table to store the color transform, the table would be updated by the user variables, but there may be simpler approches.
More informations are available in GPU Gems 2.
Upvotes: 1