nathan wu
nathan wu

Reputation: 97

how to write image with yuv420 format data with PIL or something like that

I have a video with yuv420p pixel format. At first I tried to read each frame's bytes of it using pipe and pixel format as rgb24. And I used PIL to make image of it. However, the frames read with format of rgb24 seem to lose a little bit of quality.

Here is the command of reading frame with rgb24 pixel format:

    ffmpeg -y -i input.mp4 -vcodec rawvideo -pix_fmt rgb24 -an -r 25 -f rawvideo pipe:1
    frame_data = self.process.stdout.read(1920*1080*3)

Then I tried to read it with yuv420p pixel format.

    ffmpeg -y -i input.mp4 -vcodec rawvideo -pix_fmt yuv420p -an -r 25 -f rawvideo pipe:1
    frame_data = self.process.stdout.read(1920*1080*3/2)

One single frame includes half of the bytes of rgb24 frame. It is 3110400 bytes within a 1920*1080 yuv420p frame. I tossed these data into PIL:

    Image.frombytes('YCbCr', (1920, 1080), frame_data)

but PIL raise an error of not enough image data. I looked up the modes that PIL support to write from bytes, none of it is 12_bit pixels. I also tried to transform the yuv data into rgb data, but it took a lot more time than before when is a long video to process.

Am I doing something wrong? Is there any way to write an image with raw yuv data without any transform??

Upvotes: 1

Views: 3084

Answers (1)

Mark Setchell
Mark Setchell

Reputation: 207385

Your YUV420p is chroma sub-sampled and "planar". The sub-sampling means that the U and V channels are each half the width and half the height of the full-resolution Y channel. So, they are each 1/4 of their normal size. So, because it's planar, you will actually receive:

  • whole Y channel, followed by
  • 1/4 size U channel, followed by
  • 1/4 size V channel

which means. relative to an RGB image, you will have 1 whole and 2 quarter-size channels, i.e. 1.5 channels, which is half what you would have if you had 3 full RGB channels... which is why it takes 12-bits per pixel rather than 24-bits.

PIL doesn't support sub-sampled chroma naturally. So, in order to read your data, you could:

  • read a full-resolution Y channel into a PIL L mode image
  • read a h/2 x w/2 resolution U channel into a PIL L mode image, and resize to double
  • read a h/2 x w/2 resolution V channel into a PIL L mode image, and resize to double

Then merge those three single channel images into 3-channel image.

It its unclear to me why you are using PIL at all though. If you just want to write an un-processed, raw YUV420p stream to disk, let ffmpeg do it itself.

Upvotes: 1

Related Questions