Reputation: 11096
I wonder is it possible to pass a function of another class, which has lots of imports and is pretty dense, to an instance of multiprocessing.Process
as an argument? Note that I am going to run this code on a Unix-based machine and therefore Process
will fork
rather than spawn
. Here's an example:
#class1.py
from class3 import Class3
class Class1(object):
def __init__(self):
self.class3Instance = Class3()
def func1(self):
self.class3Instance.func3()
#class3.py
import numpy as np
import pandas
import cv2 # OpenCV library
# there are many other things that I am importing here
class Class3(object):
def __init__(self):
pass
def func3(self):
np.random.seed(1)
print ('func3 changed the random seed')
#class2.py
import numpy as np
class Class2(object):
def __init__(self):
pass
def func2(self, funcInput):
funcInput()
#main.py
from class1 import Class1
from class2 import Class2
class1Instance = Class1()
class2Instance = Class2()
from multiprocessing import Process
class2Process = Process(target=class2Instance.func2, kwargs={'funcInput': class1Instance.func1})
class2Process.start()
class2Process.join()
This example seems to work fine for such a small scale but I'm afraid multiprocessing.Process
will not be able to fork
things properly in this case and instead try to make a dense copy of the classes in the hierarchy. I do not want that to be the case. Is that a valid argument?
Upvotes: 3
Views: 1348
Reputation: 155363
multiprocessing.Process
, used in fork
mode, will not need to pickle the bound method (which would require pickling the instance), so there will be minimal up front cost paid. There is no documented guarantee of this AFAICT, but CPython's implementation using fork doesn't do so, and there is no reason for them to do so, so I can't see them taking away that feature when there is nothing to be gained by it.
That said, the nature of CPython's reference counted design (with a cyclic garbage collector to handle the failures of reference counting) is such that the object header for all Python objects will be touched intermittently, which will cause any page containing a small object to get copied, so while the CPU work involved in actually performing a serialize/deserialize cycle won't occur, a long running Process
will usually end up sharing few pages with the parent process.
Also note that multiprocessing.Process
, in fork
mode, is the only time you benefit from this. forkserver
and spawn
launch methods don't get copy-on-write copies of parental pages, and therefore can't benefit, and multiprocessing.Pool
methods like apply
/apply_async
and the various map
-like functions always pickle both the function to be called and its arguments (because the worker processes don't know what tasks they'll be asked to run when they're forked, and the objects could have changed post-fork, so it always repickles them every time).
Upvotes: 3