Reputation: 1095
I am trying to process a file by cutting it up into chunks and running them through a function which processes the chunks and returns a numpy array. After looking around it seems the best method would be to use the Pool.map method by passing through classes as the arguments. These classes are initiated with the chunk sections as a variable, and another variable to store the numpy array. The output list of classes can then be parsed to get out the information I need to continue with the problem. Here is a simplified version of the script I am trying to write:
from multiprocessing import Pool
class container():
def __init__(self, k):
self.input_section = k
self.ouput_answer = 0
def compute(object_class):
# Main operation would go on in here....
object_class.output_answer = object_class.input_section
return object_class
def Main():
# Create list of classes to path as arguments
sections = [container(k) for k in range(10)]
# Create pool and compute modified classes
with Pool(4) as p:
results = p.map(compute, sections)
# Decode here to get answers
sections = [k.output_answer for k in results]
# Print answers
print(sections)
if __name__ == '__main__':
Main()
This is the error that I get when I run the script:
Exception in thread Thread-9: Traceback (most recent call last):
File "C:\Users\rbernon\AppData\Local\Continuum\Anaconda3\lib\threading.py", line 916, in _bootstrap_inner self.run() File "C:\Users\rbernon\AppData\Local\Continuum\Anaconda3\lib\threading.py", line 864, in run self._target(*self._args, **self._kwargs) File "C:\Users\rbernon\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\pool.py", line 463, in _handle_results task = get() File "C:\Users\rbernon\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\connection.py", line 251, in recv return _ForkingPickler.loads(buf.getbuffer()) AttributeError: Can't get attribute 'container' on module '__main__' from 'C:\\Users\\rbernon\\AppData\\Local\\Continuum\\Anaconda3\\lib\\site-packages\\spyder\\utils\\ipython\\start_kernel.py'>
Any help would be greatly apprectiated!
Upvotes: 0
Views: 755
Reputation: 43495
Keep in mind that every piece of data you want to have processed needs to be pickled and sent to the worker processes.
The overhead of this will reduce (and might even eliminate) the advantages of using multiple processes.
If the data file is large, it is probably better to send each worker a start and end offset as a 2-tuple of numbers, so each worker can read part of the file and process it.
Upvotes: 1