Rob123
Rob123

Reputation: 1095

How to pass classes into Pool.map as arguments - Pickling Error

I am trying to process a file by cutting it up into chunks and running them through a function which processes the chunks and returns a numpy array. After looking around it seems the best method would be to use the Pool.map method by passing through classes as the arguments. These classes are initiated with the chunk sections as a variable, and another variable to store the numpy array. The output list of classes can then be parsed to get out the information I need to continue with the problem. Here is a simplified version of the script I am trying to write:

from multiprocessing import Pool

class container():

    def __init__(self, k):
        self.input_section = k
        self.ouput_answer = 0

def compute(object_class):
    # Main operation would go on in here....

    object_class.output_answer = object_class.input_section
    return object_class

def Main():

    # Create list of classes to path as arguments
    sections = [container(k) for k in range(10)]

    # Create pool and compute modified classes
    with Pool(4) as p:
        results = p.map(compute, sections)

    # Decode here to get answers
    sections = [k.output_answer for k in results]

    # Print answers
    print(sections)

if __name__ == '__main__':
    Main()

This is the error that I get when I run the script:

Exception in thread Thread-9: Traceback (most recent call last):

File "C:\Users\rbernon\AppData\Local\Continuum\Anaconda3\lib\threading.py", line 916, in _bootstrap_inner
 self.run()   
File "C:\Users\rbernon\AppData\Local\Continuum\Anaconda3\lib\threading.py",  line 864, in run
 self._target(*self._args, **self._kwargs)   
File "C:\Users\rbernon\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\pool.py", line 463, in _handle_results
 task = get()   
File "C:\Users\rbernon\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\connection.py", line 251, in recv
 return _ForkingPickler.loads(buf.getbuffer()) 
AttributeError: Can't get attribute 'container' on module '__main__' from
    'C:\\Users\\rbernon\\AppData\\Local\\Continuum\\Anaconda3\\lib\\site-packages\\spyder\\utils\\ipython\\start_kernel.py'>

Any help would be greatly apprectiated!

Upvotes: 0

Views: 755

Answers (1)

Roland Smith
Roland Smith

Reputation: 43495

Keep in mind that every piece of data you want to have processed needs to be pickled and sent to the worker processes.

The overhead of this will reduce (and might even eliminate) the advantages of using multiple processes.

If the data file is large, it is probably better to send each worker a start and end offset as a 2-tuple of numbers, so each worker can read part of the file and process it.

Upvotes: 1

Related Questions