Greg
Greg

Reputation: 6613

python - using multiple processes / parallel processing

So I wrote a small python script that allows me to specify some folder that contains video files, and some output directory, and the program loops over all the video files, and converts them using handbrake like:

proc = subprocess.Popen('HandBrakeCLI -i ... -o ...')
proc.wait()

So it does each file in the directory, one by one. I have a quad core machine, and want to speed things by doing video conversions in parallel but I don't entirely understand how to approach that.

Do I need something like os.fork()? Something more like the multiprocessing module? I come from single threaded javascript land so this relatively new to me.

Thanks for any input!

Upvotes: 3

Views: 1003

Answers (2)

Spencer Rathbun
Spencer Rathbun

Reputation: 14900

I'd suggest using the envoy library. Under the hood it uses the threading library to spawn new cmd line programs, so if you use the connect function like so:

import envoy
proc = envoy.connect('HandBrakeCLI -i ... -o ...')
while proc.status_code = None:
    sleep(5)

You can spawn as many as you want at a time, and wait until one exits before spawning another. Please note if you have issues, that I've got a fork with my fixes that you may want to check.

I ran into a quoting issue in how the shlex library handles quotes and what your cmd line program is expecting. And, since I used it on windows, in a posix/non-posix mode problem.

Upvotes: 0

mgilson
mgilson

Reputation: 309891

Something similar to this should do the trick:

import multiprocessing
...
def wrapper(currentfile):
    #create your commandline -- for the sake of simplicity, I assume that
    #the only thing that changes is the filename you are passing to
    #HandBrakeCLI (and that the filename is the last argument to pass)
    cmd='HandBrakeCLI -i ... -o ... '+currentfile 
    proc = subprocess.Popen(cmd)
    proc.wait()
    return 'foo'

files=os.listdir(path)  #Any way that you build up your list of files is fine
output = multiprocessing.Pool(4).map(wrapper,files) #output is ['foo', 'foo', ..., 'foo']

Of course, this uses a map-like function for it's side-effects which many python people dislike... but I find it to be intuitive enough -- especially if you leave a comment. I also made the function return 'foo' to demonstrate that you can access the return value from the function quite easily.

Upvotes: 2

Related Questions