Reputation: 35
I'm trying to extract frames from video and save them to memory(ram). With CPU encoding, I don't have any problems:
ffmpeg -i input -s 224x224 -pix_fmt bgr24 -vcodec rawvideo -an -sn -f image2pipe -
But when I'm trying to use some NVIDIA GPU encoding, I'm always getting noisy images. I tried to use different commands, but the result was always the same, on Windows and Ubuntu.
ffmpeg -hwaccel cuda -i 12.mp4 -s 224x224 -f image2pipe - -vcodec rawvideo
With saving JPG's on disk, I don't have any problems.
ffmpeg -hwaccel cuvid -c:v h264_cuvid -resize 224x224 -i {input_video} \
-vf thumbnail_cuda=2,hwdownload,format=nv12 {output_dir}/%d.jpg
There was my python code for testing these commands:
import cv2
import subprocess as sp
import numpy
IMG_W = 224
IMG_H = 224
input = '12.mp4'
ffmpeg_cmd = [ 'ffmpeg','-i', input,'-s', '224x224','-pix_fmt', 'bgr24', '-vcodec', 'rawvideo', '-an','-sn', '-f', 'image2pipe', '-']
#ffmpeg_cmd = ['ffmpeg','-hwaccel' ,'cuda' ,'-i' ,'12.mp4','-s', '224x224','-f' , 'image2pipe' ,'-' , '-vcodec' ,'rawvideo']
pipe = sp.Popen(ffmpeg_cmd, stdout = sp.PIPE, bufsize=10)
images = []
encode_param = [int(cv2.IMWRITE_JPEG_QUALITY), 95]
cnt = 0
while True:
cnt += 1
raw_image = pipe.stdout.read(IMG_W*IMG_H*3)
image = numpy.fromstring(raw_image, dtype='uint8') # convert read bytes to np
if image.shape[0] == 0:
del images
break
else:
image = image.reshape((IMG_H,IMG_W,3))
cv2.imshow('test',image)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
pipe.stdout.flush()
cv2.destroyAllWindows()
Upvotes: 3
Views: 5495
Reputation: 32084
For accelerating H.264 decoding, it may be better selecting -c:v h264_cuvid
- it uses a dedicated video hardware in the GPU.
Testing with GPU-Z monitor software, it looks like -hwaccel cuda
also uses the dedicated accelerator (same as -c:v h264_cuvid
), but I am not sure.
Note:
Issues:
bufsize=10
is too small, it's better not to set bufsize
argument than setting it bufsize=10
.
Instead of '-f', 'image2pipe'
, use '-f', 'rawvideo'
(we are reading raw video frames from the pipe, and not images [like JPEG or PNG]).
We can remove '-vcodec', 'rawvideo'
, when using '-f', 'rawvideo'
.
We don't need the argument '-s', '224x224'
, because the output size is known from the input video.
Updated FFmpeg command:
ffmpeg_cmd = ['ffmpeg', '-hwaccel', 'cuda', '-c:v', 'h264_cuvid', '-i', input, '-pix_fmt', 'bgr24', '-f', 'rawvideo', '-']
For creating a reproducible code sample, I am starting by creating a synthetic video file 'test.mp4'
, that is going to be used as input:
# Build synthetic video file for testing.
################################################################################
sp.run(['ffmpeg', '-y', '-f', 'lavfi', '-i', f'testsrc=size={IMG_W}x{IMG_H}:rate=1',
'-f', 'lavfi', '-i', 'sine=frequency=300', '-c:v', 'libx264', '-pix_fmt', 'nv12',
'-c:a', 'aac', '-ar', '22050', '-t', '50', input])
################################################################################
Here is a complete (executable) code sample:
import cv2
import subprocess as sp
import numpy
IMG_W = 224
IMG_H = 224
input = 'test.mp4'
# Build synthetic video file for testing.
################################################################################
sp.run(['ffmpeg', '-y', '-f', 'lavfi', '-i', f'testsrc=size={IMG_W}x{IMG_H}:rate=1',
'-f', 'lavfi', '-i', 'sine=frequency=300', '-c:v', 'libx264', '-pix_fmt', 'nv12',
'-c:a', 'aac', '-ar', '22050', '-t', '50', input])
################################################################################
# There is no damage using both '-hwaccel cuda' and '-c:v 'h264_cuvid'.
ffmpeg_cmd = ['ffmpeg', '-hwaccel', 'cuda', '-c:v', 'h264_cuvid', '-i', input, '-pix_fmt', 'bgr24', '-f', 'rawvideo', '-']
pipe = sp.Popen(ffmpeg_cmd, stdout=sp.PIPE)
cnt = 0
while True:
cnt += 1
raw_image = pipe.stdout.read(IMG_W*IMG_H*3)
image = numpy.fromstring(raw_image, dtype='uint8') # convert read bytes to np
if image.shape[0] == 0:
break
else:
image = image.reshape((IMG_H, IMG_W, 3))
cv2.imshow('test', image)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
pipe.stdout.close()
pipe.wait()
cv2.destroyAllWindows()
Generating JPEG's instead of raw frames:
The solution I found for build a list of JPEG images in memory applies "manual" parsing of the output stream.
FFmpeg command (selecting YUV420 pixel format):
ffmpeg_cmd = ['ffmpeg', '-hwaccel', 'cuda', '-c:v', 'h264_cuvid', '-i', input, '-c:v', 'mjpeg', '-pix_fmt', 'yuvj420p', '-f', 'image2pipe', '-']
The JPEG file format has no length in the header of the SOS payload.
Finding the end of the SOS payload requires bytes scanning, and it's very slow with Python implementation.
The following solution is irrelevant for most users.
I decided to post it because it may be relevant for someone.
Here a code sample (the first part builds synthetic video file for testing):
import cv2
import subprocess as sp
import numpy as np
import struct
IMG_W = 224
IMG_H = 224
input = 'test.mp4'
# Build synthetic video file for testing.
################################################################################
sp.run(['ffmpeg', '-y', '-f', 'lavfi', '-i', f'testsrc=size={IMG_W}x{IMG_H}:rate=1',
'-f', 'lavfi', '-i', 'sine=frequency=300', '-c:v', 'libx264', '-pix_fmt', 'nv12',
'-c:a', 'aac', '-ar', '22050', '-t', '50', input])
################################################################################
def read_from_pipe(p_stdout, n_bytes):
""" Read n_bytes bytes from p_stdout pipe, and return the read data bytes. """
data = p_stdout.read(n_bytes)
while len(data) < n_bytes:
data += p_stdout.read(n_bytes - len(data))
return data
ffmpeg_cmd = ['ffmpeg', '-hwaccel', 'cuda', '-c:v', 'h264_cuvid', '-i', input, '-c:v', 'mjpeg', '-pix_fmt', 'yuvj420p', '-f', 'image2pipe', '-']
pipe = sp.Popen(ffmpeg_cmd, stdout=sp.PIPE)
jpg_list = []
cnt = 0
while True:
if not pipe.poll() is None:
break
# https://en.wikipedia.org/wiki/JPEG_File_Interchange_Format
jpeg_parts = []
# SOI
soi = read_from_pipe(pipe.stdout, 2) # Read Start of Image (FF D8)
assert soi == b'\xff\xd8', 'Error: first two bytes are not FF D8'
jpeg_parts.append(soi)
# JFIF APP0 marker segment
marker = read_from_pipe(pipe.stdout, 2) # APP0 marker (FF E0)
assert marker == b'\xff\xe0', 'Error: APP0 marker is not FF E0'
jpeg_parts.append(marker)
xx = 0
# Keep reading markers and segments until marker is EOI (0xFFD9)
while xx != 0xD9: # marker != b'\xff\xd9':
# Length of segment excluding APP0 marker
length_of_segment = read_from_pipe(pipe.stdout, 2)
jpeg_parts.append(length_of_segment)
length_of_segment = struct.unpack('>H', length_of_segment)[0] # Unpack to uint16 (big endian)
segment = read_from_pipe(pipe.stdout, length_of_segment - 2) # Read the segment (minus 2 bytes because length includes the 2 bytes of length)
jpeg_parts.append(segment)
marker = read_from_pipe(pipe.stdout, 2) # JFXX-APP0 marker (FF E0) or SOF or DHT or COM or SOS or EOI
jpeg_parts.append(marker)
if marker == b'\xff\xda': # SOS marker (0xFFDA)
# https://stackoverflow.com/questions/26715684/parsing-jpeg-sos-marker
# Summary of how to find next marker after SOS marker (0xFFDA):
#
# Skip first 3 bytes after SOS marker (2 bytes header size + 1 byte number of image components in scan).
# Search for next FFxx marker (skip every FF00 and range from FFD0 to FFD7 because they are part of scan).
# *This is summary of comments below post of user3344003 + my knowledge + Table B.1 from https://www.w3.org/Graphics/JPEG/itu-t81.pdf.
#
# *Basing on Table B.1 I can also suspect that values FF01 and FF02 through FFBF should also be skipped in point 2 but I am not sure if they cannot appear as part of encoded SOS data.
first3bytes = read_from_pipe(pipe.stdout, 3)
jpeg_parts.append(first3bytes) # Skip first 3 bytes after SOS marker (2 bytes header size + 1 byte number of image components in scan).
xx = 0
# Search for next FFxx marker, skip every FF00 and range from FFD0 to FFD7 and FF01 and FF02 through FFBF
while (xx < 0xBF) or ((xx >= 0xD0) and (xx <= 0xD7)):
# Search for next FFxx marker
b = 0
while b != 0xFF:
b = read_from_pipe(pipe.stdout, 1)
jpeg_parts.append(b)
b = b[0]
xx = read_from_pipe(pipe.stdout, 1) # Read next byte after FF
jpeg_parts.append(xx)
xx = xx[0]
# Join list parts elements to bytes array, and append the bytes array to jpg_list (convert to NumPy array).
jpg_list.append(np.frombuffer(b''.join(jpeg_parts), np.uint8))
cnt += 1
pipe.stdout.close()
pipe.wait()
# Decode and show images for testing
for im in jpg_list:
image = cv2.imdecode(im, cv2.IMREAD_UNCHANGED)
cv2.imshow('test', image)
if cv2.waitKey(100) & 0xFF == ord('q'):
break
cv2.destroyAllWindows()
Upvotes: 5