Reputation: 3
This program returns the resolution of the video but since I need for a large scale project I need multiprocessing. I have tried using and parallel processing using a different function but that would just run it multiple times not making it efficent I am posting the entire code. Can you help me create a main process that takes all cores.
import os
from tkinter.filedialog import askdirectory
from moviepy.editor import VideoFileClip
if __name__ == "__main__":
dire = askdirectory()
d = dire[:]
print(dire)
death = os.listdir(dire)
print(death)
for i in death: #multiprocess this loop
dire = d
dire += f"/{i}"
v = VideoFileClip(dire)
print(f"{i}: {v.size}")
This code works fine but I need help with creating a main process(uses all cores) for the for loop alone. can you excuse the variables names I was angry at multiprocessing. Also if you have tips on making the code efficient I would appreciate it.
Upvotes: 0
Views: 97
Reputation: 1954
moviepy
is just an wrapper around ffmpeg
and designed to edit clips thus working with one file at time - the performance is quite poor. Invoking each time the new process for a number of files is time consuming. At the end, need of multiple processes might be a result of choice of wrong lib.
I'd like to recommend to use pyAV lib instead, which provides direct py binding for ffmpeg and good performance:
import av
import os
from tkinter.filedialog import askdirectory
import multiprocessing
from concurrent.futures import ThreadPoolExecutor as Executor
MAX_WORKERS = int(multiprocessing.cpu_count() * 1.5)
def get_video_resolution(path):
container = None
try:
container = av.open(path)
frame = next(container.decode(video=0))
return path, f"{frame.width}x{frame.height}"
finally:
if container:
container.close()
def files_to_proccess():
video_dir = askdirectory()
return (full_file_path for f in os.listdir(video_dir) if (full_file_path := os.path.join(video_dir, f)) and not os.path.isdir(full_file_path))
def main():
for f in files_to_proccess():
print(f"{os.path.basename(f)}: {get_video_resolution(f)[1]}")
def main_multi_threaded():
with Executor(max_workers=MAX_WORKERS) as executor:
for path, resolution in executor.map(get_video_resolution, files_to_proccess()):
print(f"{os.path.basename(path)}: {resolution}")
if __name__ == "__main__":
#main()
main_multi_threaded()
Above are single and multi-threaded implementation, with optimal parallelism setting (in case if multithreading is something absolute required)
Upvotes: 1
Reputation: 44283
You are, I suppose, assuming that every file in the directory is a video clip. I am assuming that processing the video clip is an I/O bound "process" for which threading is appropriate. Here I have rather arbitrarily crated a thread pool size of 20 threads this way:
MAX_WORKERS = 20 # never more than this
N_WORKERS = min(MAX_WORKERS, len(death))
You would have to experiment with how large MAX_WORKERS could be before performance degrades. This might be a low number not because your system cannot support lots of threads but because concurrent access to multiple files on your disk that may be spread across the medium may be inefficient.
import os
from tkinter.filedialog import askdirectory
from moviepy.editor import VideoFileClip
from concurrent.futures import ThreadPoolExecutor as Executor
from functools import partial
def process_video(parent_dir, file):
v = VideoFileClip(f"{parent_dir}/{file}")
print(f"{file}: {v.size}")
if __name__ == "__main__":
dire = askdirectory()
print(dire)
death = os.listdir(dire)
print(death)
worker = partial(process_video, dire)
MAX_WORKERS = 20 # never more than this
N_WORKERS = min(MAX_WORKERS, len(death))
with Executor(max_workers=N_WORKERS) as executor:
results = executor.map(worker, death) # results is a list: [None, None, ...]
Update
According to @Reishin, moviepy
results in executing the ffmpeg
executable and thus ends up creating a process in which the work is being done. So there us no point in also using multiprocessing here.
Upvotes: 1