Reputation: 3260
I am using python to do some basic image processing, and want to extend it to process a video frame by frame.
I get the video as a blob from a server - .webm encoded - and have it in python as a byte string (b'\x1aE\xdf\xa3\xa3B\x86\x81\x01B\xf7\x81\x01B\xf2\x81\x04B\xf3\x81\x08B\x82\x88matroskaB\x87\x81\x04B\x85\x81\x02\x18S\x80g\x01\xff\xff\xff\xff\xff\xff\xff\x15I\xa9f\x99*\xd7\xb1\x83\x0fB@M\x80\x86ChromeWA\x86Chrome\x16T\xaek\xad\xae\xab\xd7\x81\x01s\xc5\x87\x04\xe8\xfc\x16\t^\x8c\x83\x81\x01\x86\x8fV_MPEG4/ISO/AVC\xe0\x88\xb0\x82\x02\x80\xba\x82\x01\xe0\x1fC\xb6u\x01\xff\xff\xff\xff\xff\xff ...
).
I know that there is cv.VideoCapture
, which can do almost what I need. The problem is that I would have to first write the file to disk, and then load it again. It seems much cleaner to wrap the string, e.g., into an IOStream, and feed it to some function that does the decoding.
Is there a clean way to do this in python, or is writing to disk and loading it again the way to go?
Upvotes: 7
Views: 16578
Reputation: 36
There is a pythonic way to do this by using decord
package.
import io
from decord import VideoReader
# This is the bytes object of your video.
video_str
# Load video
file_obj = io.BytesIO(video_str)
container = decord.VideoReader(file_obj)
# Get the total number of video frames
len(container)
# Access the NDarray of the (i+1)-th frame
container[i]
You can learn more about decord
in decord github repo.
You can learn more about video IO in mmaction repo. See DecordInit
for using decord IO.
Upvotes: 0
Reputation: 3260
Two years after Rotem wrote his answer there is now a cleaner / easier way to do this using ImageIO.
Note: Assuming ffmpeg
is in your path, you can generate a test video to try this example using: ffmpeg -f lavfi -i testsrc=duration=10:size=1280x720:rate=30 testsrc.webm
import imageio.v3 as iio
from pathlib import Path
webm_bytes = Path("testsrc.webm").read_bytes()
# read all frames from the bytes string
frames = iio.imread(webm_bytes, index=None, format_hint=".webm")
frames.shape
# Output:
# (300, 720, 1280, 3)
for frame in iio.imiter(webm_bytes, format_hint=".webm"):
print(frame.shape)
# Output:
# (720, 1280, 3)
# (720, 1280, 3)
# (720, 1280, 3)
# ...
To use this you'll need the ffmpeg backend (which implements a solution similar to what Rotem proposed): pip install imageio[ffmpeg]
In response to Rotem's comment a bit of explanation:
The above snippet uses imageio==2.16.0
. The v3 API is an upcoming user-facing API that streamlines reading and writing. The API is available since imageio==2.10.0
, however, you will have to use import imageio as iio
and use iio.v3.imiter
and iio.v3.imread
on versions older than 2.16.0.
The ability to read video bytes has existed forever (>5 years and counting) but has (as I am just now realizing) never been documented directly ... so I will add a PR for that soon™ :)
On older versions (tested on v2.9.0) of ImageIO (v2 API) you can still read video byte strings; however, this is slightly more verbose:
import imageio as iio
import numpy as np
from pathlib import Path
webm_bytes = Path("testsrc.webm").read_bytes()
# read all frames from the bytes string
frames = np.stack(iio.mimread(webm_bytes, format="FFMPEG", memtest=False))
# iterate over frames one by one
reader = iio.get_reader(webm_bytes, format="FFMPEG")
for frame in reader:
print(frame.shape)
reader.close()
Upvotes: 6
Reputation: 32114
According to this post, you can't use cv.VideoCapture
for decoding in memory stream.
you may decode the stream by "piping" to FFmpeg.
The solution is a bit complicated, and writing to disk is much simpler, and probably cleaner solution.
I am posting a solution using FFmpeg (and FFprobe).
There are Python bindings for FFmpeg, but the solution is executing FFmpeg as an external application using subprocess module.
(The Python binding is working well with FFmpeg, but piping to FFprobe is not).
I am using Windows 10, and I put ffmpeg.exe
and ffprobe.exe
in the execution folder (you may set the execution path as well).
For Windows, download the latest (statically liked) stable version.
I created a standalone example that performs the following:
stdin
for decoding, and read decoded raw frames from stdout
pipe.stdin
is done in chunks using Python thread.stdin
and stdout
instead of named pipes is for Windows compatibility).Piping architecture:
-------------------- Encoded --------- Decoded ------------
| Input WebM encoded | data | ffmpeg | raw frames | reshape to |
| stream (VP9 codec) | ----------> | process | ----------> | NumPy array|
-------------------- stdin PIPE --------- stdout PIPE -------------
Here is the code:
import numpy as np
import cv2
import io
import subprocess as sp
import threading
import json
from functools import partial
import shlex
# Build synthetic video and read binary data into memory (for testing):
#########################################################################
width, height = 640, 480
sp.run(shlex.split('ffmpeg -y -f lavfi -i testsrc=size={}x{}:rate=1 -vcodec vp9 -crf 23 -t 50 test.webm'.format(width, height)))
with open('test.webm', 'rb') as binary_file:
in_bytes = binary_file.read()
#########################################################################
# https://stackoverflow.com/questions/5911362/pipe-large-amount-of-data-to-stdin-while-using-subprocess-popen/14026178
# https://stackoverflow.com/questions/15599639/what-is-the-perfect-counterpart-in-python-for-while-not-eof
# Write to stdin in chunks of 1024 bytes.
def writer():
for chunk in iter(partial(stream.read, 1024), b''):
process.stdin.write(chunk)
try:
process.stdin.close()
except (BrokenPipeError):
pass # For unknown reason there is a Broken Pipe Error when executing FFprobe.
# Get resolution of video frames using FFprobe
# (in case resolution is know, skip this part):
################################################################################
# Open In-memory binary streams
stream = io.BytesIO(in_bytes)
process = sp.Popen(shlex.split('ffprobe -v error -i pipe: -select_streams v -print_format json -show_streams'), stdin=sp.PIPE, stdout=sp.PIPE, bufsize=10**8)
pthread = threading.Thread(target=writer)
pthread.start()
pthread.join()
in_bytes = process.stdout.read()
process.wait()
p = json.loads(in_bytes)
width = (p['streams'][0])['width']
height = (p['streams'][0])['height']
################################################################################
# Decoding the video using FFmpeg:
################################################################################
stream.seek(0)
# FFmpeg input PIPE: WebM encoded data as stream of bytes.
# FFmpeg output PIPE: decoded video frames in BGR format.
process = sp.Popen(shlex.split('ffmpeg -i pipe: -f rawvideo -pix_fmt bgr24 -an -sn pipe:'), stdin=sp.PIPE, stdout=sp.PIPE, bufsize=10**8)
thread = threading.Thread(target=writer)
thread.start()
# Read decoded video (frame by frame), and display each frame (using cv2.imshow)
while True:
# Read raw video frame from stdout as bytes array.
in_bytes = process.stdout.read(width * height * 3)
if not in_bytes:
break # Break loop if no more bytes.
# Transform the byte read into a NumPy array
in_frame = (np.frombuffer(in_bytes, np.uint8).reshape([height, width, 3]))
# Display the frame (for testing)
cv2.imshow('in_frame', in_frame)
if cv2.waitKey(100) & 0xFF == ord('q'):
break
if not in_bytes:
# Wait for thread to end only if not exit loop by pressing 'q'
thread.join()
try:
process.wait(1)
except (sp.TimeoutExpired):
process.kill() # In case 'q' is pressed.
################################################################################
cv2.destroyAllWindows()
Remark:
'/usr/bin/ffmpeg -i pipe: -f rawvideo -pix_fmt bgr24 -an -sn pipe:'
Upvotes: 12