Vidit Aggarwal
Vidit Aggarwal

Reputation: 3

Creating main process for a for loop

This program returns the resolution of the video but since I need for a large scale project I need multiprocessing. I have tried using and parallel processing using a different function but that would just run it multiple times not making it efficent I am posting the entire code. Can you help me create a main process that takes all cores.

import os
from tkinter.filedialog import askdirectory
from moviepy.editor import VideoFileClip


if __name__ == "__main__":
    dire = askdirectory()
    d = dire[:]
    print(dire)
    death = os.listdir(dire)
    print(death)
    for i in death: #multiprocess this loop
        dire = d
        dire += f"/{i}"
        v = VideoFileClip(dire)
        print(f"{i}: {v.size}")

This code works fine but I need help with creating a main process(uses all cores) for the for loop alone. can you excuse the variables names I was angry at multiprocessing. Also if you have tips on making the code efficient I would appreciate it.

Upvotes: 0

Views: 97

Answers (2)

Reishin
Reishin

Reputation: 1954

moviepy is just an wrapper around ffmpeg and designed to edit clips thus working with one file at time - the performance is quite poor. Invoking each time the new process for a number of files is time consuming. At the end, need of multiple processes might be a result of choice of wrong lib.

I'd like to recommend to use pyAV lib instead, which provides direct py binding for ffmpeg and good performance:

import av
import os
from tkinter.filedialog import askdirectory
import multiprocessing
from concurrent.futures import ThreadPoolExecutor as Executor

MAX_WORKERS = int(multiprocessing.cpu_count() * 1.5)

def get_video_resolution(path):
  container = None
  try:
    container = av.open(path)
    frame = next(container.decode(video=0))
    return path, f"{frame.width}x{frame.height}"
  finally:
    if container:
      container.close()

def files_to_proccess():
  video_dir = askdirectory()
  return (full_file_path for f in os.listdir(video_dir) if (full_file_path := os.path.join(video_dir, f)) and not os.path.isdir(full_file_path))


def main():   
 for f in files_to_proccess():
    print(f"{os.path.basename(f)}: {get_video_resolution(f)[1]}")


def main_multi_threaded():
  with Executor(max_workers=MAX_WORKERS) as executor:
    for path, resolution in executor.map(get_video_resolution, files_to_proccess()):
        print(f"{os.path.basename(path)}: {resolution}")


if __name__ == "__main__":
  #main()
  main_multi_threaded()

Above are single and multi-threaded implementation, with optimal parallelism setting (in case if multithreading is something absolute required)

Upvotes: 1

Booboo
Booboo

Reputation: 44283

You are, I suppose, assuming that every file in the directory is a video clip. I am assuming that processing the video clip is an I/O bound "process" for which threading is appropriate. Here I have rather arbitrarily crated a thread pool size of 20 threads this way:

MAX_WORKERS = 20 # never more than this
N_WORKERS = min(MAX_WORKERS, len(death))

You would have to experiment with how large MAX_WORKERS could be before performance degrades. This might be a low number not because your system cannot support lots of threads but because concurrent access to multiple files on your disk that may be spread across the medium may be inefficient.

import os
from tkinter.filedialog import askdirectory
from moviepy.editor import VideoFileClip
from concurrent.futures import ThreadPoolExecutor as Executor
from functools import partial


def process_video(parent_dir, file):
    v = VideoFileClip(f"{parent_dir}/{file}")
    print(f"{file}: {v.size}")


if __name__ == "__main__":
    dire = askdirectory()
    print(dire)
    death = os.listdir(dire)
    print(death)
    worker = partial(process_video, dire)
    MAX_WORKERS = 20 # never more than this
    N_WORKERS = min(MAX_WORKERS, len(death))
    with Executor(max_workers=N_WORKERS) as executor:
        results = executor.map(worker, death) # results is a list: [None, None, ...]

Update

According to @Reishin, moviepy results in executing the ffmpeg executable and thus ends up creating a process in which the work is being done. So there us no point in also using multiprocessing here.

Upvotes: 1

Related Questions