Reputation: 2086
I have a function "function" that I want to call 10 times using 2 times 5 cpus with multiprocessing.
Therefore I need a way to synchronize the processes as described in the code below.
Is this possible without using a multiprocessing pool? I get strange errors if I do so (for example "UnboundLocalError: local variable 'fd' referenced before assignment" (I don't have such a variable)). Also the processes seem to terminate randomly.
If possible I would like to do this without a pool. Thanks!
number_of_cpus = 5
number_of_iterations = 2
# An array for the processes.
processing_jobs = []
# Start 5 processes 2 times.
for iteration in range(0, number_of_iterations):
# TODO SYNCHRONIZE HERE
# Start 5 processes at a time.
for cpu_number in range(0, number_of_cpus):
# Calculate an offset for the current function call.
file_offset = iteration * cpu_number * number_of_files_per_process
p = multiprocessing.Process(target=function, args=(file_offset,))
processing_jobs.append(p)
p.start()
# TODO SYNCHRONIZE HERE
This is an (anonymized) traceback of the errors I get when I run the code in a pool:
Process Process-5:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "python_code_3.py", line 88, in function_x
xyz = python_code_1.function_y(args)
File "/python_code_1.py", line 254, in __init__
self.WK = file.WK(filename)
File "/python_code_2.py", line 1754, in __init__
self.__parse__(name, data, fast_load)
File "/python_code_2.py", line 1810, in __parse__
fd.close()
UnboundLocalError: local variable 'fd' referenced before assignment
Most of the processes crash like that but not all of them. More of them seem to crash when I increase the number of processes. I also thought this might be due to memory limitations...
Upvotes: 3
Views: 14325
Reputation: 94881
Here's how you can do the synchronization you're looking for without using a pool:
import multiprocessing
def function(arg):
print ("got arg %s" % arg)
if __name__ == "__main__":
number_of_cpus = 5
number_of_iterations = 2
# An array for the processes.
processing_jobs = []
# Start 5 processes 2 times.
for iteration in range(1, number_of_iterations+1): # Start the range from 1 so we don't multiply by zero.
# Start 5 processes at a time.
for cpu_number in range(1, number_of_cpus+1):
# Calculate an offset for the current function call.
file_offset = iteration * cpu_number * number_of_files_per_process
p = multiprocessing.Process(target=function, args=(file_offset,))
processing_jobs.append(p)
p.start()
# Wait for all processes to finish.
for proc in processing_jobs:
proc.join()
# Empty active job list.
del processing_jobs[:]
# Write file here
print("Writing")
Here it is with a Pool
:
import multiprocessing
def function(arg):
print ("got arg %s" % arg)
if __name__ == "__main__":
number_of_cpus = 5
number_of_iterations = 2
pool = multiprocessing.Pool(number_of_cpus)
for i in range(1, number_of_iterations+1): # Start the range from 1 so we don't multiply by zero
file_offsets = [number_of_files_per_process * i * cpu_num for cpu_num in range(1, number_of_cpus+1)]
pool.map(function, file_offsets)
print("Writing")
# Write file here
As you can see, the Pool
solution is nicer.
This doesn't solve your traceback problem, though. It's hard for me to say how to fix that without understanding what's actually causing that. You may need to use a multiprocessing.Lock
to synchronize access to the resource.
Upvotes: 1
Reputation: 15170
A Pool can be very easy to use. Here's a full example:
import multiprocessing
def calc(num):
return num*2
if __name__=='__main__': # required for Windows
pool = multiprocessing.Pool() # one Process per CPU
for output in pool.map(calc, [1,2,3]):
print 'output:',output
output: 2
output: 4
output: 6
Upvotes: 1