itaher
itaher

Reputation: 21

Using Multiprocessing with Modules

I am writing a module such that in one function I want to use the Pool function from the multiprocessing library in Python 3.6. I have done some research on the problem and the it seems that you cannot use if __name__=="__main__" as the code is not being run from main. I have also noticed that the python pool processes get initialized in my task manager but essentially are stuck.

So for example:

class myClass()
    ...
    lots of different functions here
    ...
    def multiprocessFunc()
        do stuff in here
    def funcThatCallsMultiprocessFunc()
        array=[array of filenames to be called]
        if __name__=="__main__":
            p = Pool(processes=20)
            p.map_async(multiprocessFunc,array)

I tried to remove the if __name__=="__main__" part but still no dice. any help would appreciated.

Upvotes: 1

Views: 237

Answers (2)

quamrana
quamrana

Reputation: 39404

It seems to me that your have just missed out a self. from your code. I should think this will work:

class myClass():
    ...
    # lots of different functions here
    ...
    def multiprocessFunc(self, file):
        # do stuff in here
    def funcThatCallsMultiprocessFunc(self):
        array = [array of filenames to be called]
        p = Pool(processes=20)
        p.map_async(self.multiprocessFunc, array)  #added self. here

Now having done some experiments, I see that map_async could take quite some time to start up (I think because multiprocessing creates processes) and any test code might call funcThatCallsMultiprocessFunc and then quit before the Pool has got started.

In my tests I had to wait for over 10 seconds after funcThatCallsMultiprocessFunc before calls to multiprocessFunc started. But once started, they seemed to run just fine.

This is the actual code I've used:

MyClass.py

from multiprocessing import Pool

import time
import string

class myClass():
    def __init__(self):
        self.result = None
    def multiprocessFunc(self, f):
        time.sleep(1)
        print(f)
        return f
    def funcThatCallsMultiprocessFunc(self):
        array = [c for c in string.ascii_lowercase]
        print(array)
        p = Pool(processes=20)
        p.map_async(self.multiprocessFunc, array, callback=self.done)
        p.close()
    def done(self, arg):
        self.result = 'Done'
        print('done', arg)

Run.py

from MyClass import myClass

import time

def main():
    c = myClass()
    c.funcThatCallsMultiprocessFunc()
    for i in range(30):
        print(i, c.result)
        time.sleep(1)

if __name__=="__main__":
    main()

Upvotes: 1

Dschoni
Dschoni

Reputation: 3872

The if __name__=='__main__' construct is an import protection. You want to use it, to stop multiprocessing from running your setup on import.

In your case, you can leave out this protection in the class setup. Be sure to protect the execution points of the class in the calling file like this:

def apply_async_with_callback():
    pool = mp.Pool(processes=30)
    for i in range(z):
        pool.apply_async(parallel_function, args = (i,x,y, ), callback = callback_function)
    pool.close()
    pool.join()
    print "Multiprocessing done!"

if __name__ == '__main__':
    apply_async_with_callback()

Upvotes: 0

Related Questions