Reputation: 55
I have byte array with yuv420 data.
byte[] yuv420;//yuv data
How can I convert this to an Image<Bgr, byte>
I found a math formula to convert to RGB and then to Image<Bgr, byte>
but it is very slow. Is there a way to convert it faster?
There is a class in Emgu for converting
but I can not understand how use this class. Can anyone help?
static Bitmap ConvertYUV2RGB(byte[] yuvFrame, byte[] rgbFrame, int width, int height)
int uIndex = width * height;
int vIndex = uIndex + ((width * height) >> 2);
int gIndex = width * height;
int bIndex = gIndex * 2;
int temp = 0;
//图片为pic1,RGB颜色的二进制数据转换得的int r,g,b;
Bitmap bm = new Bitmap(width, height);
int r = 0;
int g = 0;
int b = 0;
for (int y = 0; y < height; y++)
for (int x = 0; x < width; x++)
// R分量
temp = (int)(yuvFrame[y * width + x] + (yuvFrame[vIndex + (y / 2) * (width / 2) + x / 2] - 128) * YUV2RGB_CONVERT_MATRIX[0, 2]);
rgbFrame[y * width + x] = (byte)(temp < 0 ? 0 : (temp > 255 ? 255 : temp));
// G分量
temp = (int)(yuvFrame[y * width + x] + (yuvFrame[uIndex + (y / 2) * (width / 2) + x / 2] - 128) * YUV2RGB_CONVERT_MATRIX[1, 1] + (yuvFrame[vIndex + (y / 2) * (width / 2) + x / 2] - 128) * YUV2RGB_CONVERT_MATRIX[1, 2]);
rgbFrame[gIndex + y * width + x] = (byte)(temp < 0 ? 0 : (temp > 255 ? 255 : temp));
// B分量
temp = (int)(yuvFrame[y * width + x] + (yuvFrame[uIndex + (y / 2) * (width / 2) + x / 2] - 128) * YUV2RGB_CONVERT_MATRIX[2, 1]);
rgbFrame[bIndex + y * width + x] = (byte)(temp < 0 ? 0 : (temp > 255 ? 255 : temp));
Color c = Color.FromArgb(rgbFrame[y * width + x], rgbFrame[gIndex + y * width + x], rgbFrame[bIndex + y * width + x]);
bm.SetPixel(x, y, c);
return bm;
static double[,] YUV2RGB_CONVERT_MATRIX = new double[3, 3] { { 1, 0, 1.4022 }, { 1, -0.3456, -0.7145 }, { 1, 1.771, 0 } };
static byte clamp(float input)
if (input < 0) input = 0;
if (input > 255) input = 255;
return (byte)Math.Abs(input);
Upvotes: 5
Views: 14261
Reputation: 151
One faster mode. Two mutiplication and two add less per pixel:
private static unsafe void YUV2RGBManaged(byte[] YUVData, byte[] RGBData, int width, int height)
//returned pixel format is 2yuv - i.e. luminance, y, is represented for every pixel and the u and v are alternated
//like this (where Cb = u , Cr = y)
//Y0 Cb Y1 Cr Y2 Cb Y3
C = 298 * (Y - 16) + 128
D = U - 128
E = V - 128
R = clip(( C + 409 * E) >> 8)
G = clip(( C - 100 * D - 208 * E) >> 8)
B = clip(( C + 516 * D ) >> 8)
* here are a whole bunch more formats for doing this...
fixed(byte* pRGBs = RGBData, pYUVs = YUVData)
for (int r = 0; r < height; r++)
byte* pRGB = pRGBs + r * width * 3;
byte* pYUV = pYUVs + r * width * 2;
//process two pixels at a time
for (int c = 0; c < width; c += 2)
int C1 = 298 * (pYUV[1] - 16) + 128;
int C2 = 298 * (pYUV[3] - 16) + 128;
int D = pYUV[2] - 128;
int E = pYUV[0] - 128;
int R1 = (C1 + 409 * E) >> 8;
int G1 = (C1 - 100 * D - 208 * E) >> 8;
int B1 = (C1 + 516 * D) >> 8;
int R2 = (C2 + 409 * E) >> 8;
int G2 = (C2 - 100 * D - 208 * E) >> 8;
int B2 = (298 * C2 + 516 * D) >> 8;
//check for overflow
//unsurprisingly this takes the bulk of the time.
pRGB[0] = (byte)(R1 < 0 ? 0 : R1 > 255 ? 255 : R1);
pRGB[1] = (byte)(G1 < 0 ? 0 : G1 > 255 ? 255 : G1);
pRGB[2] = (byte)(B1 < 0 ? 0 : B1 > 255 ? 255 : B1);
pRGB[3] = (byte)(R2 < 0 ? 0 : R2 > 255 ? 255 : R2);
pRGB[4] = (byte)(G2 < 0 ? 0 : G2 > 255 ? 255 : G2);
pRGB[5] = (byte)(B2 < 0 ? 0 : B2 > 255 ? 255 : B2);
pRGB += 6;
pYUV += 4;
Upvotes: 0
Reputation: 2402
I just found an old piece of code which might help you. YUV conversion using OpenCVSharp (disclaimer: i removed some unnecessary code and haven't tested this!)
IplImage yuvImage = new IplImage(w, h, BitDepth.U8, 3);
IplImage rgbImage = new IplImage(w, h, BitDepth.U8, 3);
Cv.CvtColor(yuvImage, rgbImage, ColorConversion.CrCbToBgr);
to answer your other question - to converting byte[] to a Bitmap use this
int w= 100;
int h = 200;
int ch = 3;
byte[] imageData = new byte[w*h*ch]; //you image data here
Bitmap bitmap = new Bitmap(w,h,PixelFormat.Format24bppRgb);
BitmapData bmData = bitmap.LockBits(new System.Drawing.Rectangle(0, 0, bitmap.Width, bitmap.Height), ImageLockMode.ReadWrite, bitmap.PixelFormat);
IntPtr pNative = bmData.Scan0;
Upvotes: 0
Reputation: 2402
You are in luck because i solved exactly this issue before. There are some links in the code for more info.
In general always try to use pointers when doing image processing and avoid calling functions in nested loops. In my code the size comparison is by far the slowest part but unfortunately it is needed (try switching it off using the pre-processor switch).
I have to say though that in the end i never used this function because it was just too slow, i opted to implement it in c++ and call it from c# using p invoke.
private static unsafe void YUV2RGBManaged(byte[] YUVData, byte[] RGBData, int width, int height)
//returned pixel format is 2yuv - i.e. luminance, y, is represented for every pixel and the u and v are alternated
//like this (where Cb = u , Cr = y)
//Y0 Cb Y1 Cr Y2 Cb Y3
* C = Y - 16
D = U - 128
E = V - 128
R = clip(( 298 * C + 409 * E + 128) >> 8)
G = clip(( 298 * C - 100 * D - 208 * E + 128) >> 8)
B = clip(( 298 * C + 516 * D + 128) >> 8)
* here are a whole bunch more formats for doing this...
fixed(byte* pRGBs = RGBData, pYUVs = YUVData)
for (int r = 0; r < height; r++)
byte* pRGB = pRGBs + r * width * 3;
byte* pYUV = pYUVs + r * width * 2;
//process two pixels at a time
for (int c = 0; c < width; c += 2)
int C1 = pYUV[1] - 16;
int C2 = pYUV[3] - 16;
int D = pYUV[2] - 128;
int E = pYUV[0] - 128;
int R1 = (298 * C1 + 409 * E + 128) >> 8;
int G1 = (298 * C1 - 100 * D - 208 * E + 128) >> 8;
int B1 = (298 * C1 + 516 * D + 128) >> 8;
int R2 = (298 * C2 + 409 * E + 128) >> 8;
int G2 = (298 * C2 - 100 * D - 208 * E + 128) >> 8;
int B2 = (298 * C2 + 516 * D + 128) >> 8;
#if true
//check for overflow
//unsurprisingly this takes the bulk of the time.
pRGB[0] = (byte)(R1 < 0 ? 0 : R1 > 255 ? 255 : R1);
pRGB[1] = (byte)(G1 < 0 ? 0 : G1 > 255 ? 255 : G1);
pRGB[2] = (byte)(B1 < 0 ? 0 : B1 > 255 ? 255 : B1);
pRGB[3] = (byte)(R2 < 0 ? 0 : R2 > 255 ? 255 : R2);
pRGB[4] = (byte)(G2 < 0 ? 0 : G2 > 255 ? 255 : G2);
pRGB[5] = (byte)(B2 < 0 ? 0 : B2 > 255 ? 255 : B2);
pRGB[0] = (byte)(R1);
pRGB[1] = (byte)(G1);
pRGB[2] = (byte)(B1);
pRGB[3] = (byte)(R2);
pRGB[4] = (byte)(G2);
pRGB[5] = (byte)(B2);
pRGB += 6;
pYUV += 4;
and incase you decide to implement this in c++
void YUV2RGB(void *yuvDataIn,void *rgbDataOut, int w, int h, int outNCh)
const int ch2 = 2 * outNCh;
unsigned char* pRGBs = (unsigned char*)rgbDataOut;
unsigned char* pYUVs = (unsigned char*)yuvDataIn;
for (int r = 0; r < h; r++)
unsigned char* pRGB = pRGBs + r * w * outNCh;
unsigned char* pYUV = pYUVs + r * w * 2;
//process two pixels at a time
for (int c = 0; c < w; c += 2)
int C1 = pYUV[1] - 16;
int C2 = pYUV[3] - 16;
int D = pYUV[2] - 128;
int E = pYUV[0] - 128;
int R1 = (298 * C1 + 409 * E + 128) >> 8;
int G1 = (298 * C1 - 100 * D - 208 * E + 128) >> 8;
int B1 = (298 * C1 + 516 * D + 128) >> 8;
int R2 = (298 * C2 + 409 * E + 128) >> 8;
int G2 = (298 * C2 - 100 * D - 208 * E + 128) >> 8;
int B2 = (298 * C2 + 516 * D + 128) >> 8;
//unsurprisingly this takes the bulk of the time.
pRGB[0] = (unsigned char)(R1 < 0 ? 0 : R1 > 255 ? 255 : R1);
pRGB[1] = (unsigned char)(G1 < 0 ? 0 : G1 > 255 ? 255 : G1);
pRGB[2] = (unsigned char)(B1 < 0 ? 0 : B1 > 255 ? 255 : B1);
pRGB[3] = (unsigned char)(R2 < 0 ? 0 : R2 > 255 ? 255 : R2);
pRGB[4] = (unsigned char)(G2 < 0 ? 0 : G2 > 255 ? 255 : G2);
pRGB[5] = (unsigned char)(B2 < 0 ? 0 : B2 > 255 ? 255 : B2);
pRGB += ch2;
pYUV += 4;
Upvotes: 7
Reputation: 22133
The biggest offender in that code is the use of Bitmap.SetPixel
; it is very slow to do this on every inner loop iteration. Instead, use a byte array to store your RGB values and once it is filled, copy it into a bitmap as a single step.
Secondly, understand that y, u and v are bytes, and so can only have 256 possible values. It is therefore perfectly feasible to build lookup tables for r, g and b, so you don't have to perform any computations in your inner loop.
Finally, if you really want performance you'll have to write this in C++ using pointer arithmetic and compile with all optimizations on. This loop is also a very good candidate for a parallel for since every iteration operates on independent data. It is also possible to optimize this further with SSE intrinsics, converting several pixels per instruction.
Hopefully this should get you started.
Upvotes: 0