Jaime Perez
Jaime Perez

Reputation: 109

python multiprocessing can't find error

I'm trying to run a function with multiprocessing. This is the code:

import multiprocessing as mu

output = []
def f(x):
    output.append(x*x)


jobs = []
np = mu.cpu_count()

for n in range(np*500):
    p = mu.Process(target=f, args=(n,))
    jobs.append(p)


running = []

for i in range(np):
    p = jobs.pop()
    running.append(p)
    p.start()

while jobs != []:
    for r in running:
        if r.exitcode == 0:
            try:
                running.remove(r)
                p = jobs.pop()
                p.start()
                running.append(p)
            except IndexError:
                break

print "Done:"
print output

The output is [], while it should be [1,4,9,...]. Someone sees where i'm making a mistake?

Upvotes: 1

Views: 3060

Answers (2)

Jkm
Jkm

Reputation: 187

Edit: Thanks to @Roland Smith to point out. The main problem is the function f(x). When child process call this, it's unable for them to fine the output variable (since it's not shared).

Edit: Just as @cdarke said, in multiprocessing you have to carefully control the shared object that child process could access(maybe a lock), and it's pretty complicated and hard to debug.

Personally I suggest to use the Pool.map method for this.

For instance, I assume that you run this code directly, not as a module, then your code would be:

import multiprocessing as mu

def f(x):
    return x*x

if __name__ == '__main__':   
    np = mu.cpu_count()
    args = [n for n in range(np*500)]

    pool = mu.Pool(processes=np)
    result = pool.map(f, args)
    pool.close()
    pool.join()
    print result

but there's something you must know

  1. if you just run this file but not import with module, the if __name__ == '__main__': is important, since python will load this file as a module for other process, if you don't place the function 'f' outside if __name__ == '__main__':, the child process would not be able to find your function 'f' **Edit:**thanks @Roland Smith point out that we could use tuple
  2. if you have more then one args for the function f, then you might need a tuple to do so, for instance

    def f((x,y))
       return x*y
    
    args = [(n,1) for n in range(np*500)]
    result = pool.map(f, args)
    

    or check here for more detailed discussion

Upvotes: 0

Roland Smith
Roland Smith

Reputation: 43495

You are using multiprocessing, not threading. So your output list is not shared between the processes.

There are several possible solutions;

  1. Retain most of your program but use a multiprocessing.Queue instead of a list. Let the workers put their results in the queue, and read it from the main program. It will copy data from process to process, so for big chunks of data this will have significant overhead.
  2. You could use shared memory in the form of multiprocessing.Array. This might be the best solution if the processed data is large.
  3. Use a Pool. This takes care of all the process management for you. Just like with a queue, it copies data from process to process. It is probably the easiest to use. IMO this is the best option if the data sent to/from each worker is small.
  4. Use threading so that the output list is shared between threads. Threading in CPython has the restriction that only one thread at a time can be executing Python bytecode, so you might not get as much performance benefit as you'd expect. And unlike the multiprocessing solutions it will not take advantage of multiple cores.

Upvotes: 1

Related Questions