FFredrixson
FFredrixson

Reputation: 51

Python multiprocessing

This question is more fact finding and thought process than code oriented.

I have many compiled C++ programs that I need to run at different times and with different parameters. I'm looking at using Python multiprocessing to read a job from job queue (rabbitmq) and then feed that job to a C++ program to run (maybe subprocess). I was looking at the multiprocessing module because this will all run on dual Xeon server so I want to take full advantage of the multiprocessor ability of my server.

The Python program would be the central manager and would simply read jobs from the queue, spawn a process (or subprocess?) with the appropriate C++ program to run the job, get the results (subprocess stdout & stderr), feed that to a callback and put the process back in a queue of processes waiting for the next job to run.

First, does this sound like a valid strategy?

Second, are there any type of examples of something similar to this?

Thank you in advance.

Upvotes: 5

Views: 2743

Answers (3)

unutbu
unutbu

Reputation: 880587

The Python program would be the central manager and would simply read jobs from the que, spawn a process (or subprocess?) with the appropriate C++ program to run the job, get the results (subprocess stdout & stderr), feed that to a callback and put the process back in a que of processes waiting for the next job to run.

You don't need the multiprocessing module for this. The multiprocessing module is good for running Python functions as separate processes. To run a C++ program and read results from stdout, you'd only need the subprocess module. The queue could be a list, and your Python program would simply loop while the list is non-empty.


However, if you want to

  1. spawn multiple worker processes
  2. have them read from a common queue
  3. use the arguments from the queue to spawn C++ programs (in parallel)
  4. use the output of the C++ programs to put new items in the queue

then you could do it with multiprocessing like this:

test.py:

import multiprocessing as mp
import subprocess
import shlex

def worker(q):
    while True:
        # Get an argument from the queue
        x=q.get()

        # You might change this to run your C++ program
        proc=subprocess.Popen(
            shlex.split('test2.py {x}'.format(x=x)),stdout=subprocess.PIPE)
        out,err=proc.communicate()

        print('{name}: using argument {x} outputs {o}'.format(
            x=x,name=mp.current_process().name,o=out))

        q.task_done()

        # Put a new argument into the queue
        q.put(int(out))

def main():
    q=mp.JoinableQueue()

    # Put some initial values into the queue
    for t in range(1,3):
        q.put(t)

    # Create and start a pool of worker processes
    for i in range(3):
        p=mp.Process(target=worker, args=(q,))
        p.daemon=True
        p.start()
    q.join()
    print "Finished!"

if __name__=='__main__':
    main()

test2.py (a simple substitute for your C++ program):

import time
import sys

x=int(sys.argv[1])
time.sleep(0.5)
print(x+3)

Running test.py might yield something like this:

Process-1: using argument 1 outputs 4
Process-3: using argument 3 outputs 6
Process-2: using argument 2 outputs 5
Process-3: using argument 6 outputs 9
Process-1: using argument 4 outputs 7
Process-2: using argument 5 outputs 8
Process-3: using argument 9 outputs 12
Process-1: using argument 7 outputs 10
Process-2: using argument 8 outputs 11
Process-1: using argument 10 outputs 13

Notice that the numbers in the right-hand column are fed back into the queue, and are (eventually) used as arguments to test2.py and show up as numbers in the left-hard column.

Upvotes: 11

Eli Bendersky
Eli Bendersky

Reputation: 273696

Sounds like a good strategy, but you don't need the multiprocessing module for it, but rather the subprocess module. subprocess is for running child processes from a Python program and interacting with them (stdio, stdout, pipes, etc.), while multiprocessing is more about distributing Python code to run in multiple processes to gain performance through parallelism.

Depending on the responsiveness strategy, you may also want to look at threading for launching subprocesses from a thread. This will allow you to wait on one subprocess while still being responsive on the queue to accept other jobs.

Upvotes: 0

S.Lott
S.Lott

Reputation: 391972

First, does this sound like a valid strategy?

Yes.

Second, are there any type of examples of something similar to this?

Celery

Upvotes: 1

Related Questions