Speсtra
Speсtra

Reputation: 35

How to save frames to memory from video with ffmpeg gpu encoding?

I'm trying to extract frames from video and save them to memory(ram). With CPU encoding, I don't have any problems:

ffmpeg -i input -s 224x224 -pix_fmt bgr24 -vcodec rawvideo -an -sn -f image2pipe -

But when I'm trying to use some NVIDIA GPU encoding, I'm always getting noisy images. I tried to use different commands, but the result was always the same, on Windows and Ubuntu.

ffmpeg -hwaccel cuda -i 12.mp4 -s 224x224 -f image2pipe - -vcodec rawvideo

With saving JPG's on disk, I don't have any problems.

ffmpeg -hwaccel cuvid -c:v h264_cuvid -resize 224x224 -i {input_video} \
     -vf thumbnail_cuda=2,hwdownload,format=nv12 {output_dir}/%d.jpg

There was my python code for testing these commands:

import cv2
import subprocess as sp
import numpy

IMG_W = 224
IMG_H = 224
input = '12.mp4'

ffmpeg_cmd = [ 'ffmpeg','-i', input,'-s', '224x224','-pix_fmt', 'bgr24',  '-vcodec', 'rawvideo', '-an','-sn', '-f', 'image2pipe', '-']


#ffmpeg_cmd = ['ffmpeg','-hwaccel' ,'cuda' ,'-i' ,'12.mp4','-s', '224x224','-f' , 'image2pipe' ,'-' , '-vcodec' ,'rawvideo']

pipe = sp.Popen(ffmpeg_cmd, stdout = sp.PIPE, bufsize=10)
images = []
encode_param = [int(cv2.IMWRITE_JPEG_QUALITY), 95]
cnt = 0
while True:
    cnt += 1
    raw_image = pipe.stdout.read(IMG_W*IMG_H*3)
    image =  numpy.fromstring(raw_image, dtype='uint8')     # convert read bytes to np
    if image.shape[0] == 0:
        del images
        break   
    else:
        image = image.reshape((IMG_H,IMG_W,3))
        

    cv2.imshow('test',image)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

pipe.stdout.flush()
cv2.destroyAllWindows()

Upvotes: 3

Views: 5495

Answers (1)

Rotem
Rotem

Reputation: 32084

For accelerating H.264 decoding, it may be better selecting -c:v h264_cuvid - it uses a dedicated video hardware in the GPU.

Testing with GPU-Z monitor software, it looks like -hwaccel cuda also uses the dedicated accelerator (same as -c:v h264_cuvid), but I am not sure.

Note:

  • NVIDIA CUVID video decoding accelerator does not support all sizes and pixel formats.

Issues:

  • bufsize=10 is too small, it's better not to set bufsize argument than setting it bufsize=10.

  • Instead of '-f', 'image2pipe', use '-f', 'rawvideo' (we are reading raw video frames from the pipe, and not images [like JPEG or PNG]).
    We can remove '-vcodec', 'rawvideo', when using '-f', 'rawvideo'.

  • We don't need the argument '-s', '224x224', because the output size is known from the input video.

Updated FFmpeg command:

ffmpeg_cmd = ['ffmpeg', '-hwaccel', 'cuda', '-c:v', 'h264_cuvid', '-i', input, '-pix_fmt', 'bgr24', '-f', 'rawvideo', '-']

For creating a reproducible code sample, I am starting by creating a synthetic video file 'test.mp4', that is going to be used as input:

# Build synthetic video file for testing.
################################################################################
sp.run(['ffmpeg', '-y', '-f', 'lavfi', '-i', f'testsrc=size={IMG_W}x{IMG_H}:rate=1',
        '-f', 'lavfi', '-i', 'sine=frequency=300', '-c:v', 'libx264', '-pix_fmt', 'nv12', 
        '-c:a', 'aac', '-ar', '22050', '-t', '50', input])
################################################################################

Here is a complete (executable) code sample:

import cv2
import subprocess as sp
import numpy


IMG_W = 224
IMG_H = 224
input = 'test.mp4'

# Build synthetic video file for testing.
################################################################################
sp.run(['ffmpeg', '-y', '-f', 'lavfi', '-i', f'testsrc=size={IMG_W}x{IMG_H}:rate=1',
        '-f', 'lavfi', '-i', 'sine=frequency=300', '-c:v', 'libx264', '-pix_fmt', 'nv12', 
        '-c:a', 'aac', '-ar', '22050', '-t', '50', input])
################################################################################

# There is no damage using both '-hwaccel cuda' and '-c:v 'h264_cuvid'.
ffmpeg_cmd = ['ffmpeg', '-hwaccel', 'cuda', '-c:v', 'h264_cuvid', '-i', input, '-pix_fmt', 'bgr24', '-f', 'rawvideo', '-']
   
pipe = sp.Popen(ffmpeg_cmd, stdout=sp.PIPE)

cnt = 0
while True:
    cnt += 1
    raw_image = pipe.stdout.read(IMG_W*IMG_H*3)
    image =  numpy.fromstring(raw_image, dtype='uint8')     # convert read bytes to np
    if image.shape[0] == 0:
        break
    else:
        image = image.reshape((IMG_H, IMG_W, 3))
        
    cv2.imshow('test', image)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

pipe.stdout.close()
pipe.wait()
cv2.destroyAllWindows()

Update:

Generating JPEG's instead of raw frames:

The solution I found for build a list of JPEG images in memory applies "manual" parsing of the output stream.

FFmpeg command (selecting YUV420 pixel format):

ffmpeg_cmd = ['ffmpeg', '-hwaccel', 'cuda', '-c:v', 'h264_cuvid', '-i', input, '-c:v', 'mjpeg', '-pix_fmt', 'yuvj420p', '-f', 'image2pipe', '-']

The JPEG file format has no length in the header of the SOS payload.
Finding the end of the SOS payload requires bytes scanning, and it's very slow with Python implementation.

The following solution is irrelevant for most users.
I decided to post it because it may be relevant for someone.

Here a code sample (the first part builds synthetic video file for testing):

import cv2
import subprocess as sp
import numpy as np
import struct


IMG_W = 224
IMG_H = 224
input = 'test.mp4'

# Build synthetic video file for testing.
################################################################################
sp.run(['ffmpeg', '-y', '-f', 'lavfi', '-i', f'testsrc=size={IMG_W}x{IMG_H}:rate=1',
         '-f', 'lavfi', '-i', 'sine=frequency=300', '-c:v', 'libx264', '-pix_fmt', 'nv12',
         '-c:a', 'aac', '-ar', '22050', '-t', '50', input])
################################################################################

def read_from_pipe(p_stdout, n_bytes):
    """ Read n_bytes bytes from p_stdout pipe, and return the read data bytes. """
    data = p_stdout.read(n_bytes)
    while len(data) < n_bytes:
        data += p_stdout.read(n_bytes - len(data))

    return data


ffmpeg_cmd = ['ffmpeg', '-hwaccel', 'cuda', '-c:v', 'h264_cuvid', '-i', input, '-c:v', 'mjpeg', '-pix_fmt', 'yuvj420p', '-f', 'image2pipe', '-']

pipe = sp.Popen(ffmpeg_cmd, stdout=sp.PIPE)

jpg_list = []

cnt = 0
while True:
    if not pipe.poll() is None:
        break

    # https://en.wikipedia.org/wiki/JPEG_File_Interchange_Format
    jpeg_parts = []

    # SOI
    soi = read_from_pipe(pipe.stdout, 2)  # Read Start of Image (FF D8)
    assert soi == b'\xff\xd8', 'Error: first two bytes are not FF D8'
    jpeg_parts.append(soi)

    # JFIF APP0 marker segment
    marker = read_from_pipe(pipe.stdout, 2)  # APP0 marker (FF E0)
    assert marker == b'\xff\xe0', 'Error: APP0 marker is not FF E0'
    jpeg_parts.append(marker)

    xx = 0

    # Keep reading markers and segments until marker is EOI (0xFFD9)
    while xx != 0xD9:  # marker != b'\xff\xd9':
        # Length of segment excluding APP0 marker
        length_of_segment = read_from_pipe(pipe.stdout, 2)
        jpeg_parts.append(length_of_segment)
        length_of_segment = struct.unpack('>H', length_of_segment)[0]  # Unpack to uint16 (big endian)

        segment = read_from_pipe(pipe.stdout, length_of_segment - 2)  # Read the segment (minus 2 bytes because length includes the 2 bytes of length)
        jpeg_parts.append(segment)

        marker = read_from_pipe(pipe.stdout, 2)  # JFXX-APP0 marker (FF E0) or SOF or DHT or COM or SOS or EOI
        jpeg_parts.append(marker)

        if marker == b'\xff\xda':  # SOS marker (0xFFDA)
            # https://stackoverflow.com/questions/26715684/parsing-jpeg-sos-marker
            # Summary of how to find next marker after SOS marker (0xFFDA):
            #
            # Skip first 3 bytes after SOS marker (2 bytes header size + 1 byte number of image components in scan).
            # Search for next FFxx marker (skip every FF00 and range from FFD0 to FFD7 because they are part of scan).
            # *This is summary of comments below post of user3344003 + my knowledge + Table B.1 from https://www.w3.org/Graphics/JPEG/itu-t81.pdf.
            #
            # *Basing on Table B.1 I can also suspect that values FF01 and FF02 through FFBF should also be skipped in point 2 but I am not sure if they cannot appear as part of encoded SOS data.
            first3bytes = read_from_pipe(pipe.stdout, 3)
            jpeg_parts.append(first3bytes)  # Skip first 3 bytes after SOS marker (2 bytes header size + 1 byte number of image components in scan).

            xx = 0

            # Search for next FFxx marker, skip every FF00 and range from FFD0 to FFD7 and FF01 and FF02 through FFBF
            while (xx < 0xBF) or ((xx >= 0xD0) and (xx <= 0xD7)):
                # Search for next FFxx marker
                b = 0
                while b != 0xFF:
                    b = read_from_pipe(pipe.stdout, 1)
                    jpeg_parts.append(b)
                    b = b[0]
            
                xx = read_from_pipe(pipe.stdout, 1)  # Read next byte after FF
                jpeg_parts.append(xx)
                xx = xx[0]

    # Join list parts elements to bytes array, and append the bytes array to jpg_list (convert to NumPy array).
    jpg_list.append(np.frombuffer(b''.join(jpeg_parts), np.uint8))

    cnt += 1


pipe.stdout.close()
pipe.wait()


# Decode and show images for testing
for im in jpg_list:
    image = cv2.imdecode(im, cv2.IMREAD_UNCHANGED)

    cv2.imshow('test', image)
    if cv2.waitKey(100) & 0xFF == ord('q'):
        break

cv2.destroyAllWindows()

Upvotes: 5

Related Questions