Reputation: 107
I am uploading frames from a camera to a texture on the GPU for processing (using SharpDX). My issue is ATM is that I have the frames coming in as 24bit RGB, but DX11 no longer has the 24bit RGB texture format, only 32bit RGBA. After each 3 bytes I need to add another byte with the value of 255 (no transparency). I've tried this method of iterating thru the byte array to add it but it's too expensive. Using GDI bitmaps to convert is also very expensive.
int count = 0;
for (int i = 0; i < frameDataBGRA.Length - 3; i+=4)
{
frameDataBGRA[i] = frameData[i - count];
frameDataBGRA[i + 1] = frameData[(i + 1) - count];
frameDataBGRA[i + 2] = frameData[(i + 2) - count];
frameDataBGRA[i + 3] = 255;
count++;
}
Upvotes: 1
Views: 2580
Reputation: 763
@catflier: good work, but it can go a little faster. ;-)
Reproduced times on my hardware:
My experiments:
Things that have improved speed:
The Code:
static void FastConvert(int pixelCount, byte[] rgbData, byte[] rgbaData)
{
fixed (byte* rgbP = &rgbData[0], rgbaP = &rgbaData[0])
{
for (long i = 0, offsetRgb = 0; i < pixelCount; i++, offsetRgb += 3)
{
((uint*)rgbaP)[i] = *(uint*)(rgbP + offsetRgb) | 0xff000000;
}
}
}
static void FastConvert4Loop(long pixelCount, byte* rgbP, byte* rgbaP)
{
for (long i = 0, offsetRgb = 0; i < pixelCount; i += 4, offsetRgb += 12)
{
uint c1 = *(uint*)(rgbP + offsetRgb);
uint c2 = *(uint*)(rgbP + offsetRgb + 3);
uint c3 = *(uint*)(rgbP + offsetRgb + 6);
uint c4 = *(uint*)(rgbP + offsetRgb + 9);
((uint*)rgbaP)[i] = c1 | 0xff000000;
((uint*)rgbaP)[i + 1] = c2 | 0xff000000;
((uint*)rgbaP)[i + 2] = c3 | 0xff000000;
((uint*)rgbaP)[i + 3] = c4 | 0xff000000;
}
}
static void FastConvert4(int pixelCount, byte[] rgbData, byte[] rgbaData)
{
if ((pixelCount & 3) != 0) throw new ArgumentException();
fixed (byte* rgbP = &rgbData[0], rgbaP = &rgbaData[0])
{
FastConvert4Loop(pixelCount, rgbP, rgbaP);
}
}
Upvotes: 1
Reputation: 8953
Assuming you can compile with unsafe, using pointers in that case will give you significant boost.
First create two structs to hold data in a packed way:
[StructLayout(LayoutKind.Sequential)]
public struct RGBA
{
public byte r;
public byte g;
public byte b;
public byte a;
}
[StructLayout(LayoutKind.Sequential)]
public struct RGB
{
public byte r;
public byte g;
public byte b;
}
First version :
static void Process_Pointer_PerChannel(int pixelCount, byte[] rgbData, byte[] rgbaData)
{
fixed (byte* rgbPtr = &rgbData[0])
{
fixed (byte* rgbaPtr = &rgbaData[0])
{
RGB* rgb = (RGB*)rgbPtr;
RGBA* rgba = (RGBA*)rgbaPtr;
for (int i = 0; i < pixelCount; i++)
{
rgba->r = rgb->r;
rgba->g = rgb->g;
rgba->b = rgb->b;
rgba->a = 255;
rgb++;
rgba++;
}
}
}
}
This avoids a lot of indexing, and passes data directly.
Another version which is slightly faster, to box directly:
static void Process_Pointer_Cast(int pixelCount, byte[] rgbData, byte[] rgbaData)
{
fixed (byte* rgbPtr = &rgbData[0])
{
fixed (byte* rgbaPtr = &rgbaData[0])
{
RGB* rgb = (RGB*)rgbPtr;
RGBA* rgba = (RGBA*)rgbaPtr;
for (int i = 0; i < pixelCount; i++)
{
RGB* cp = (RGB*)rgba;
*cp = *rgb;
rgba->a = 255;
rgb++;
rgba++;
}
}
}
}
One small extra optimization (which is marginal), if you keep the same array all the time and reuse it, you can initialize it once with alpha set to 255 eg :
static void InitRGBA_Alpha(int pixelCount, byte[] rgbaData)
{
for (int i = 0; i < pixelCount; i++)
{
rgbaData[i * 4 + 3] = 255;
}
}
Then as you will never change this channel, other functions do not need to write into it anymore:
static void Process_Pointer_Cast_NoAlpha (int pixelCount, byte[] rgbData, byte[] rgbaData)
{
fixed (byte* rgbPtr = &rgbData[0])
{
fixed (byte* rgbaPtr = &rgbaData[0])
{
RGB* rgb = (RGB*)rgbPtr;
RGBA* rgba = (RGBA*)rgbaPtr;
for (int i = 0; i < pixelCount; i++)
{
RGB* cp = (RGB*)rgba;
*cp = *rgb;
rgb++;
rgba++;
}
}
}
}
In my test (running a 1920*1080 image, 100 iterations), I get (i7, x64 release build, average running time)
Please note that of course all those functions can as well be easily chunked and parts run in multi threaded versions.
If you need higher performance, you have two options ( a bit out of scope from the question)
Upvotes: 1