harbun
harbun

Reputation: 525

using multiprocessing in a sub process

In windows, there must be a check if the process is main before multiprocessing can be used, otherwise there will be an infinite loop.

I tried to change the name of the process to the name of the subprocess to use multiprocessing from within a class or function that I call, but no luck. Is this even possible? Up to date I failed to use multiprocessing, unless it was using the main process.

if it is possible, could someone provide a example on how to use multiprocessing within a class or function that is being called from a higher process? Thanks.

Edit:

Here is an Example - the first one works, but everything is done in 1 file: simplemtexample3.py:

import random
import multiprocessing
import math

def mp_factorizer(nums, nprocs):
    #schtze den prozess
    #print __name__
    if __name__ == '__main__':
        out_q = multiprocessing.Queue()
        chunksize = int(math.ceil(len(nums) / float(nprocs)))
        procs = []
        for i in range(nprocs):

            p = multiprocessing.Process(
                    target=worker,            
                    args=(nums[chunksize * i:chunksize * (i + 1)],
                          out_q))
            procs.append(p)
            p.start()

        # Collect all results into a single result dict. We know how many dicts
        # with results to expect.
        resultlist = []
        for i in range(nprocs):
            temp=out_q.get()
            index =0
            #print temp
            for i in temp:
                resultlist.append(temp[index][0][0:])
                index +=1

        # Wait for all worker processes to finish
        for p in procs:
            p.join()
            resultlist2 = [x for x in resultlist if x != []]
        return resultlist2

def worker(nums, out_q):
    """ The worker function, invoked in a process. 'nums' is a
        list of numbers to factor. The results are placed in
        a dictionary that's pushed to a queue.
    """
    outlist = []

    for n in nums:
        newnumber= n*2
        newnumberasstring = str(newnumber)
        if newnumber:
            outlist.append(newnumberasstring)
    out_q.put(outlist)

l = []
for i in range(80):
    l.append(random.randint(1,8))

print mp_factorizer(l, 4)

However, when I try to call mp_factorizer from another file, it does not work because of the if __name__ == '__main__':

simplemtexample.py

import random
import multiprocessing
import math

def mp_factorizer(nums, nprocs):
    #schtze den prozess
    #print __name__
    if __name__ == '__main__':
        out_q = multiprocessing.Queue()
        chunksize = int(math.ceil(len(nums) / float(nprocs)))
        procs = []
        for i in range(nprocs):

            p = multiprocessing.Process(
                    target=worker,            
                    args=(nums[chunksize * i:chunksize * (i + 1)],
                          out_q))
            procs.append(p)
            p.start()

        # Collect all results into a single result dict. We know how many dicts
        # with results to expect.
        resultlist = []
        for i in range(nprocs):
            temp=out_q.get()
            index =0
            #print temp
            for i in temp:
                resultlist.append(temp[index][0][0:])
                index +=1

        # Wait for all worker processes to finish
        for p in procs:
            p.join()
            resultlist2 = [x for x in resultlist if x != []]
        return resultlist2

def worker(nums, out_q):
    """ The worker function, invoked in a process. 'nums' is a
        list of numbers to factor. The results are placed in
        a dictionary that's pushed to a queue.
    """
    outlist = []

    for n in nums:
        newnumber= n*2
        newnumberasstring = str(newnumber)
        if newnumber:
            outlist.append(newnumberasstring)
    out_q.put(outlist)

startsimplemtexample.py

import simplemtexample as smt
import random

l = []
for i in range(80):
    l.append(random.randint(1,8))

print smt.mp_factorizer(l, 4)

Upvotes: 2

Views: 1792

Answers (2)

harbun
harbun

Reputation: 525

if __name__ == '__main__'is mandatory(at least in windows), if one wants to use multiprocessing.

In windows it works like this: For every worker thread that you want to generate, windows will automatically start the main process, and all needed files again. However, only the first process that has been started is called main. This is why blocking execution of mt_factorizer with if __name__ == '__main__' prevents multiprocessing from creating an infinite loop.

So essentially windows needs to read the file that contains the worker, and all functions the worker calls - for each worker. By blocking mt_factorizer we make sure that no additional workers will be created, while windows can still execute the workers. This is the reason why multiprocessing examples that have all code in one file block the creation of workers (like mt_factorizer does in this case) directly (but not the worker function), so windows can still execute the worker function. If all code is in one file, and the whole file is being protected, no worker could be created.

If the multiprocessing code is located in another class and being called, if __name__ == '__main__' needs to be implemented directly above the call: mpteststart.py

import random
import mptest as smt

l = []
for i in range(4):
    l.append(random.randint(1,8))
print "Random numbers generated"
if __name__ == '__main__':
    print smt.mp_factorizer(l, 4)

mptest.py

import multiprocessing
import math

print "Reading mptest.py file"
def mp_factorizer(nums, nprocs):

    out_q = multiprocessing.Queue()
    chunksize = int(math.ceil(len(nums) / float(nprocs)))
    procs = []
    for i in range(nprocs):

        p = multiprocessing.Process(
                target=worker,            
                args=(nums[chunksize * i:chunksize * (i + 1)],
                      out_q))
        procs.append(p)
        p.start()

    # Collect all results into a single result dict. We know how many dicts
    # with results to expect.
    resultlist = []
    for i in range(nprocs):
        temp=out_q.get()
        index =0
        #print temp
        for i in temp:
            resultlist.append(temp[index][0][0:])
            index +=1

    # Wait for all worker processes to finish
    for p in procs:
        p.join()
        resultlist2 = [x for x in resultlist if x != []]
    return resultlist2

def worker(nums, out_q):
    """ The worker function, invoked in a process. 'nums' is a
        list of numbers to factor. The results are placed in
        a dictionary that's pushed to a queue.
    """
    outlist = []

    for n in nums:
        newnumber= n*2
        newnumberasstring = str(newnumber)
        if newnumber:
            outlist.append(newnumberasstring)
    out_q.put(outlist)

In above code, if __name__ == '__main__' has been removed since it is already in the calling file.

However, the result is somewhat unexpected:

Reading mptest.py file
random numbers generated
Reading mptest.py file
random numbers generated
worker started
Reading mptest.py file
random numbers generated
worker started
Reading mptest.py file
random numbers generated
worker started
Reading mptest.py file
random numbers generated
worker started
['1', '1', '4', '1']

Multiprocessing is being blocked from endless execution, but the rest of the code still is being executed several times(Random number generation in this case). This will not only result in a performance decrease, but also may lead to other nasty bugs. The solution is to protect the whole main process from being repeatedly executed by windows, if multiprocessing is being used somewhere down the line: mptest.py

import random
import mptest as smt

if __name__ == '__main__':  
    l = []
    for i in range(4):
        l.append(random.randint(1,8))
    print "random numbers generated"   
    print smt.mp_factorizer(l, 4)

Now all we get back is the desired result, the random numbers are only generated once:

Reading mptest.py file
random numbers generated
Reading mptest.py file
worker started
Reading mptest.py file
worker started
Reading mptest.py file
worker started
Reading mptest.py file
worker started
['1', '6', '2', '1']

Note that in this example, mpteststart.py is the main process. If it is not, if __name__ == '__main__' has to be moved up the calling chain until it is in the main process. Once the main process is being protected that way, there will be no unwanted repeated code execution anymore.

Upvotes: 2

unutbu
unutbu

Reputation: 879361

Windows lacks os.fork. So on Windows, the multiprocessing module starts a new Python interpreter and (re)imports the script that calls multiprocessing.Process.

The purpose for using if __name__ == '__main__' is to protect the call to multiprocessing.Process from getting called again when the script is reimported. (If you don't protect it, then you get a fork bomb.)

If you are calling multiprocessing.Process from within a class or a function that will not get called when the script is reimported, then there is no problem. Just go ahead and use multiprocessing.Process as usual.

Upvotes: 1

Related Questions