Muschel
Muschel

Reputation: 368

PyInstaller-built Windows EXE fails with multiprocessing.pool

I'm using Python's multiprocessing.Pool library to create multiple processes in the following way:

def parallel_function(f):
    def easy_parallize(f, sequence):
        """ assumes f takes sequence as input, easy w/ Python's scope """
        pool = Pool(processes=8)  # depends on available cores
        result = pool.map(f, sequence)  # for i in sequence: result[i] = f(i)
        cleaned = [x for x in result if x is not None]  # getting results
        cleaned = np.asarray(cleaned)
        pool.close()
        pool.join()
        return cleaned
    return partial(easy_parallize, f)

where f is the function doing the work and sequence are the parameters, see http://zqdevres.qiniucdn.com/data/20150702120338/index.html for the tutorial.

The project is being packaged into a single Windows EXE using PyInstaller 3.1.1. with the --onedir option. PyInstaller creates the EXE without issues, and I am able to execute the parts of the program that do not use multithreading without issues.

My problem comes up when I try to execute the parts of the program that use the multiprocessing function above. Then, the program fails with the following error message (written over and over by each child thread):

  File "multiprocessing\context.py", line 148, in freeze_support
  File "multiprocessing\spawn.py", line 74, in freeze_support
  File "multiprocessing\spawn.py", line 106, in spawn_main
  File "multiprocessing\spawn.py", line 115, in _main
  File "multiprocessing\spawn.py", line 221, in prepare
  File "multiprocessing\context.py", line 231, in set_start_method
RuntimeError: context has already been set
classifier_v3_gui returned -1

The freeze_support comes as a suggestion from https://github.com/pyinstaller/pyinstaller/wiki/Recipe-Multiprocessing, where the main should contain the call as the first line:

if name == "__main__":
    multiprocessing.freeze_support()

A similar problem has been discussed in PyInstaller-built Windows EXE fails with multiprocessing and refers to the solution in the link above, however, a) the discussed solution for --onedir does not seem to work for me, as you can see from the error message I get, b) I do not know how _Popen is related to Pool-ing, so for --onefile, I do not even know how to implement the class redefinition.

If I do not use multiprocessing.freeze_support() in main, the program behaves differently in that the RuntimeError does not happen, instead I get the usage instructions for my program printed to the cmd over and over, I assume by each of the spawned processes which are trying to call the EXE themselves, which is obviously not what is supposed to happen.

Needless to say, the program runs without a problem as a .py script.

I am using 32-bit Python 3.4 (have had the same multithreading issue with Python 2.7 as well) on Windows 10.

The only other solution I can think of is to rewrite my solution to use multiprocessing.Process instead of multiprocessing.Pool, given that there seems to be a fix. If you have a reasonably low effort way of doing what my Pool function is doing, I'll settle for that.

Upvotes: 3

Views: 3376

Answers (1)

JFM
JFM

Reputation: 73

Have you found a solution to your problem? I'm having the same kind of issue, but I managed to make it work with a simple code. However it does not work yet with my complete coding (using sklearn). If you manage to make it work, let me know, it might help me too.

here is the coding that worked for me:

import multiprocessing
import time

def func (param1, param2):
    print ("hello " + str (param1))
    time.sleep (param2)
    print ("Hello again " + str (param1))
    return "test "  + str (param1)

def main ():
    lParams = [("test1", 3),
               ("test2", 2),
               ("test3", 1)]
    args = []
    for param1, param2 in lParams:
        tup = (param1, param2)
        args.append (tup)

    with multiprocessing.Pool (multiprocessing.cpu_count ()) as p:
        results = [p.apply_async (func, a) for a in args]
        for r in results:
            print ("Returned " + r.get ())

if __name__ == '__main__':
    multiprocessing.freeze_support()
    main ()

Note that I'm using the "-D" option in pyinstaller (one directory) to compile my application.

Upvotes: 2

Related Questions