Reputation: 1
I'm trying to merge a video with its seperate audio file in Python. I have both the video and the audio as bytes in memory and I would like to know how to allow ffmpeg-python to merge them.
I have seen it done through ffmpeg.concat and by reading the files from disk using ffmpeg.input, but when my program downloads the files, it saves them in memory as bytes objects.
I tried passing the byte objects into ffmpeg.concat but it threw an error as it is expecting stream objects:
TypeError: Expected incoming stream(s) to be of one of the following types: ffmpeg.nodes.FilterableStream; got <class 'bytes'>
How should I approach the problem when my files are in bytes format?
Upvotes: 0
Views: 1574
Reputation:
Yes it is possible to merge audio and video from memory using ffmpeg-python.
You need to tell FFmpeg to read the data from stdin, and redirect the output to stdout, here is the documentation: FFmpeg protocols pipe documentation.
But, in the documentation, they show examples written in shell. With shell specific pipe manipulation with operators '|', '>', '<', etc... To communicate with standard streams in python, you can use the subprocess module(and also the os module), and its Popen class, to communicate with stdin and stdout. See this StackOverflow answer.
So, to do that using ffmpeg-python, you need to tell FFmpeg to load video and audio from two different pipes using subprocess.Popen. On this GitHub issue answer made by the author of the library edit memory video files #49 we can see how to do that for 1 pipe. But for 2 pipes, we need to use named pipes using the os.mkfifo function, in order to differentiate the audio from the video. To create multiple named pipes, you can read this StackOverflow answer Multiple named pipes in FFmpeg.
Now that we know this, the only things that we need to do know is how to merge the video and the audio.
To do this we are going to use the ffmpeg.concat
function from the ffmpeg-python Python FFmpeg bindings.
When you tried to do this, you got this TypeError
because you were trying to to pass directly the audio and video as bytes objects, instead of passing the expected streams objects. You must use ffmpeg.input
which returns a stream object.
To get the result in stdout to then keep it in memory, we are going to use ffmpeg.output('pipe:pipe_name')
to redirect the FFmpeg output to standard output.
Now that we know everything, let's start coding:
import os
import subprocess
import threading
import ffmpeg
def writer(data, pipe_name, chunk_size=8192):
# the chunk size choice(8kB) is arbitrary. Read the notes below the code.
# Open the pipes as opening files (open for "open for writing only").
# fd_pipe1 is a file descriptor (an integer)
fd_pipe = os.open(pipe_name, os.O_WRONLY)
for i in range(0, len(data), chunk_size):
# Write to named pipe as writing to a file
# Write bytes of data to fd_pipe
os.write(fd_pipe, data[i: chunk_size + i])
# Closing the pipes as closing files.
os.close(fd_pipe)
if __name__ == "__main__":
# Create "named pipes".
pipe1 = os.mkfifo("video_pipe")
pipe1 = os.mkfifo("audio_pipe")
# Create ffmpeg-python streams
input_video = ffmpeg.input("pipe:video_pipe")
input_audio = ffmpeg.input("pipe:audio_pipe")
# Open FFmpeg as sub-process
# Merge the video and the audio, and get this as ffmpeg arguments.
args = ffmpeg.concat(input_video, input_audio, v=1, a=1).output("pipe:").get_args()
# Create a subprocess with stdin and stdout pipe
process = subprocess.Popen(["ffmpeg"] + args, stdin=subprocess.PIPE, stdout=subprocess.PIPE)
# Initialize two "writer" threads
# each writer writes data to named pipe in chunks of bytes.
# I consider that you have stored the video in a variable
# named video_in_memory, and for the audio audio_in_memory.
thread1 = threading.Thread(target=writer, args=(video_in_memory, pipe1))
thread2 = threading.Thread(target=writer, args=(audio_in_memory, pipe2))
# Start the two threads
thread1.start()
thread2.start()
# Wait for the two writer threads to finish
thread1.join()
thread2.join()
process.wait() # Wait for FFmpeg sub-process to finish
# Remove the "named pipes".
os.unlink(pipe1)
os.unlink(pipe2)
# store the processed video in memory
output_data = p.stdout.read()
Notes:
It is also possible to do a streaming approach where you read a little bit of the video/audio in memory, pump it into ffmpeg, and then stream the upload piece by piece, but it's a bit harder because you either have to use non-blocking IO or multiple threads/greenlets (e.g. using gevent). It can be done, but I'd start with the 'read-entire-file-to-memory' approach first.
This code has been tested(a bit modified), with YouTube downloaded video via yt-dlp
without audio and the same video but only with audio, on ArchLinux ARM.
I am sorry but this program may only work on Unix-like systems(POSIX-compliant e.g *BSD and Linux distributions) since it uses some "low level" OS operation, this is why(I think) @kesh asked you if you were in a non-windows environment.
UPDATE: @MlgEpicBanana said that it is possible to creating a named pipe through pywin32 on Windows. I did not know that since I am not using a Windows environment but these links may help you: Official Windows Documentation, Official PyWin32 Documentation and this StackOverflow question.
On POSIX-compliant systems or on Windows, the maximum guaranteed size of a named pipe transaction is 64 kilobytes. In some limited cases, transactions beyond 64 kilobytes are possible, depending on OS versions participating in the transaction and dynamic network conditions. However, there is no guarantee that transactions above 64 kilobytes will succeed.
Because the data is larger than 65536 bytes, we need to write the data to the pipe in small chunks. I choosed 8 kilobytes arbitrary. You can use less or more kilobytes as long as it is below 64 depending on what you are doing.
English is not my native language(I am french), so sorry for any kind of spelling/grammatical mistakes.
Upvotes: 3