Reputation: 45
Fellow co-worker and I have run into an issue on our Macs (His: Intel, Mine: M1). I'm on 12.5.1 Monterey (not sure of his).
When using Python 3.7 and implementing the following code, all works as expected:
from concurrent.futures import ProcessPoolExecutor
def foo(a, b=0):
return a + b
with ProcessPoolExecutor(max_workers=4) as executor:
future = executor.submit(foo, 1, b=2)
print(future.result())
# prints "3"
BUT when I use Python 3.8 - 3.10, I get an error trace that looks like:
Process SpawnProcess-1:
Traceback (most recent call last):
File "/Users/user/.pyenv/versions/3.10.2/lib/python3.10/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/Users/user/.pyenv/versions/3.10.2/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/Users/user/.pyenv/versions/3.10.2/lib/python3.10/concurrent/futures/process.py", line 237, in _process_worker
call_item = call_queue.get(block=True)
File "/Users/user/.pyenv/versions/3.10.2/lib/python3.10/multiprocessing/queues.py", line 122, in get
return _ForkingPickler.loads(res)
AttributeError: Can't get attribute 'foo' on <module '__main__' (built-in)>
Traceback (most recent call last):
File "<stdin>", line 3, in <module>
File "/Users/user/.pyenv/versions/3.10.2/lib/python3.10/concurrent/futures/_base.py", line 446, in result
return self.__get_result()
File "/Users/user/.pyenv/versions/3.10.2/lib/python3.10/concurrent/futures/_base.py", line 391, in __get_result
raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
If we fire up a Docker python:3.10-slim
and execute the same code on the Mac, it works great in the container.
Can't find any concrete question or evidence that others have run into this problem, but this toy example fails on both our Macs. Seems like it has troubles finding the definition of the foo
function. Originally ran into this problem with Pebble
, but have found it in the builtin library now.
Any history of problems with Mac Python 3.8+ and concurrent.futures
?
It was pointed out that you can check for __main__
in the toy example above, so I am including another example, using Pebble, that works great everywhere, except Mac Python 3.8+ where it throws the same sort of error. This is how I use Pebble in my code, but breaks when I use the later Python, only on a Mac:
from pebble import concurrent
class Foo:
def __init__(self, timeout):
self.timeout = timeout
def do_math(self, a, b):
# Define our task function
@concurrent.process(timeout=self.timeout)
def bar(a, b=0):
return a + b
future = bar(a, b)
return future.result()
if __name__ == "__main__":
foo = Foo(timeout=5)
print(foo.do_math(2, 3))
# Prints 5, except on Mac Python 3.8+
Again, on Mac Python 3.8+ (only) it throws this error:
pebble.common.RemoteTraceback: Traceback (most recent call last):
File "/Users/user/Projects/temp/venv/lib/python3.10/site-packages/pebble/concurrent/process.py", line 205, in _function_lookup
return _registered_functions[name]
KeyError: 'Foo.do_math.<locals>.bar'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/user/Projects/temp/venv/lib/python3.10/site-packages/pebble/common.py", line 174, in process_execute
return function(*args, **kwargs)
File "/Users/user/Projects/temp/venv/lib/python3.10/site-packages/pebble/concurrent/process.py", line 194, in _trampoline
function = _function_lookup(name, module)
File "/Users/user/Projects/temp/venv/lib/python3.10/site-packages/pebble/concurrent/process.py", line 209, in _function_lookup
function = getattr(mod, name)
AttributeError: module '__mp_main__' has no attribute 'Foo.do_math.<locals>.bar'
Upvotes: 1
Views: 1372
Reputation: 281330
Python 3.8 changed the default multiprocessing startmethod on Mac from fork
to spawn
, because forking was leading to crashes. (Fork-without-exec is just very precarious in general, and it can cause problems on non-Mac systems too, but Mac system frameworks in particular do not play well with forking.)
Your code is unsafe to use with the spawn startmethod. In the first example, this is because you're missing an if __name__ == '__main__'
guard. In the second example, it's because you're using a nested function, which cannot be loaded by the worker process.
You need to make your code spawn
-safe. Add if __name__ == '__main__'
guards, stop trying to run nested functions in worker processes, and fix whatever else you might be doing that doesn't work with spawn
.
You could try passing a fork
context to pebble:
import multiprocessing
@concurrent.process(timeout=self.timeout, context=multiprocessing.get_context('fork'))
def bar(a, b=0):
...
but there's a good reason the default was changed. Using fork
on Mac is likely to lead to weird crashes. If you're lucky, it'll crash immediately. If you're unlucky, you'll get an urgent call at 3 in the morning on a Saturday 5 months from now, when you've forgotten all about this and you have to figure out the problem from scratch.
Upvotes: 2
Reputation: 26667
Your code will run successfully if you add if __name__ == "__main__"
:
from concurrent.futures import ProcessPoolExecutor
def foo(a, b=0):
return a + b
if __name__ == '__main__':
with ProcessPoolExecutor(max_workers=4) as executor:
future = executor.submit(foo, 1, b=2)
print(future.result())
# prints "3"
Upvotes: 0