Reputation: 3
So for the last couple of hours I was trying to figure out a specific issue with multiprocessing when I cythonize my project and have it consist of only .pyd files. The reason I do this is because I have a requirement not to have source code on a remote machine, I've tried pyinstaller but it is just soo flaky and is missing .dll files from multiple packages when I compile it, even when I made it stable I still preferer not to use it due to how flaky it is and it could break even by adding or updating a version of a single package.
After I compile my .pyd files I then create a main.py file which imports main startup function from the compiled main.pyd file and kicks off the whole workflow. I would say the project is rather big as it has maybe 40-50+ files.
This is my setup.py that I used to generate the .pyd files:
from distutils.core import setup
from Cython.Build import cythonize
import os
# Recursively collect all Python files in the folder
def collect_files(base_dir):
extensions = []
for root, dirs, files in os.walk(base_dir):
if "brainstorming" in dirs:
dirs.remove("brainstorming")
if "venv" in dirs:
dirs.remove("venv")
for file in files:
if file.endswith(".py"): # Include all Python files
full_path = os.path.join(root, file)
module_path = os.path.splitext(full_path.replace(base_dir + os.sep, "").replace(os.sep, "."))[0]
extensions.append(full_path)
return list(set(extensions)) # Ensure unique paths
base_dir = "."
extensions = collect_files(base_dir)
setup(
name="myproj",
ext_modules=cythonize(
extensions,
compiler_directives={
"language_level": "3"}
),
zip_safe=True,
)
At one point in my code I have the following logic:
Process(target=output_process, daemon=True, args=(<arg 1>, <arg2> ...).start()
This is where I encounter the biggest issue, it seems like multiprocessing module in python tries to pickle the cyfunction and somehow dynamically import the module that the function I am trying to run is in and this miserably fails.
The "output_procces" function is located in a folder called "lib" on the same level as the file I am trying to run it from and in the "lib" folder I have a "outputprocess.py" that stores the mentioned function.
Project/
|-- module/
| |-- lib/
| | |-- __init__.py
| | |-- outputprocess.py
| |-- __init__.py
| |-- module_that_calls_multiprocessing.py
No matter what I try when it gets to the multiprocess startup part I get the following exception:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Users\spaceflux\anaconda3\Lib\multiprocessing\spawn.py", line 122, in spawn_main
exitcode = _main(fd, parent_sentinel)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\spaceflux\anaconda3\Lib\multiprocessing\spawn.py", line 132, in _main
self = reduction.pickle.load(from_parent)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ModuleNotFoundError: No module named 'outputprocess'
The module is there and when I run the project with the normal .py files everything is working fine.
I also tried putting the function in the same file to test the behavior and I get:
File "C:\Users\spaceflux\anaconda3\Lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
^^^^^^^^^^^^^^^^^
File "C:\Users\spaceflux\anaconda3\Lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\spaceflux\anaconda3\Lib\multiprocessing\context.py", line 336, in _Popen
return Popen(process_obj)
^^^^^^^^^^^^^^^^^^
File "C:\Users\spaceflux\anaconda3\Lib\multiprocessing\popen_spawn_win32.py", line 94, in __init__
reduction.dump(process_obj, to_child)
File "C:\Users\spaceflux\anaconda3\Lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <cyfunction output_process at 0x0000023151445970>: import of module '<my module>' failed
Any tips on how I can work around this? I like the .pyd approach as it dose not take too much time to compile the files and it makes it possible to debug code and apply hotfixes faster in a release environment but I am open for suggestions on other approaches or how I can tackle this
Upvotes: 0
Views: 49