Martin Todorov
Martin Todorov

Reputation: 3

Multiprocessing not working with cythonize project

So for the last couple of hours I was trying to figure out a specific issue with multiprocessing when I cythonize my project and have it consist of only .pyd files. The reason I do this is because I have a requirement not to have source code on a remote machine, I've tried pyinstaller but it is just soo flaky and is missing .dll files from multiple packages when I compile it, even when I made it stable I still preferer not to use it due to how flaky it is and it could break even by adding or updating a version of a single package.

After I compile my .pyd files I then create a main.py file which imports main startup function from the compiled main.pyd file and kicks off the whole workflow. I would say the project is rather big as it has maybe 40-50+ files.

This is my setup.py that I used to generate the .pyd files:

from distutils.core import setup
from Cython.Build import cythonize
import os

# Recursively collect all Python files in the folder
def collect_files(base_dir):
    extensions = []
    for root, dirs, files in os.walk(base_dir):
        if "brainstorming" in dirs:
            dirs.remove("brainstorming")
        if "venv" in dirs:
            dirs.remove("venv")
        for file in files:
            if file.endswith(".py"):  # Include all Python files
                full_path = os.path.join(root, file)
                module_path = os.path.splitext(full_path.replace(base_dir + os.sep, "").replace(os.sep, "."))[0]
                extensions.append(full_path)
    return list(set(extensions))  # Ensure unique paths

base_dir = "."
extensions = collect_files(base_dir)

setup(
    name="myproj",
    ext_modules=cythonize(
        extensions,
        compiler_directives={
            "language_level": "3"}
    ),
    zip_safe=True,
)

At one point in my code I have the following logic:

Process(target=output_process, daemon=True, args=(<arg 1>, <arg2> ...).start()

This is where I encounter the biggest issue, it seems like multiprocessing module in python tries to pickle the cyfunction and somehow dynamically import the module that the function I am trying to run is in and this miserably fails.

The "output_procces" function is located in a folder called "lib" on the same level as the file I am trying to run it from and in the "lib" folder I have a "outputprocess.py" that stores the mentioned function.

Project/
|-- module/
|   |-- lib/
|   |   |-- __init__.py
|   |   |-- outputprocess.py 
|   |-- __init__.py
|   |-- module_that_calls_multiprocessing.py

No matter what I try when it gets to the multiprocess startup part I get the following exception:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\spaceflux\anaconda3\Lib\multiprocessing\spawn.py", line 122, in spawn_main
    exitcode = _main(fd, parent_sentinel)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\spaceflux\anaconda3\Lib\multiprocessing\spawn.py", line 132, in _main
    self = reduction.pickle.load(from_parent)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ModuleNotFoundError: No module named 'outputprocess'

The module is there and when I run the project with the normal .py files everything is working fine.

I also tried putting the function in the same file to test the behavior and I get:

File "C:\Users\spaceflux\anaconda3\Lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
                  ^^^^^^^^^^^^^^^^^
  File "C:\Users\spaceflux\anaconda3\Lib\multiprocessing\context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\spaceflux\anaconda3\Lib\multiprocessing\context.py", line 336, in _Popen
    return Popen(process_obj)
           ^^^^^^^^^^^^^^^^^^
  File "C:\Users\spaceflux\anaconda3\Lib\multiprocessing\popen_spawn_win32.py", line 94, in __init__
    reduction.dump(process_obj, to_child)
  File "C:\Users\spaceflux\anaconda3\Lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <cyfunction output_process at 0x0000023151445970>: import of module '<my module>' failed

Any tips on how I can work around this? I like the .pyd approach as it dose not take too much time to compile the files and it makes it possible to debug code and apply hotfixes faster in a release environment but I am open for suggestions on other approaches or how I can tackle this

Upvotes: 0

Views: 49

Answers (0)

Related Questions