nonagon
nonagon

Reputation: 3473

Spawn a process in Python without forking

I'm working with Python (2.7) and pymongo (3.3), and I need to spawn a child process to run a job asynchronously. Unfortunately pymongo is not fork-safe as described here (and I need to interact with the db before spawning the child process).

I ran an experiment using subprocess.Popen (with shell set to True and then False) and multiprocessing.Process. As far as I can tell they both fork the parent process to create the child process, but only multiprocessing.Process causes pymongo to print its warning that it has detected a forked process.

I'm wondering what the pythonic way of doing this is. It seems that perhaps os.system will do it for me but subprocess is described as an intended replacement for os.system so I wonder whether I'm missing something.

Upvotes: 3

Views: 4717

Answers (3)

A. Jesse Jiryu Davis
A. Jesse Jiryu Davis

Reputation: 24007

I think you misunderstand; since PyMongo's documentation warns you that a single MongoClient is not fork-safe, you interpret that to mean that PyMongo prohibits your whole program from ever creating subprocesses.

Any single MongoClient is not fork-safe, meaning you must not create it before forking and use the same MongoClient object after forking. Using PyMongo in your program overall, or using one MongoClient before a fork and a different one after, are all safe.

That's why subprocess.Popen is ok: you fork, then exec (to replace your program with a different one in the child process), and therefore you cannot possibly use the same MongoClient in the child afterward.

To quote the PyMongo FAQ:

On Unix systems the multiprocessing module spawns processes using fork(). Care must be taken when using instances of MongoClient with fork(). Specifically, instances of MongoClient must not be copied from a parent process to a child process. Instead, the parent process and each child process must create their own instances of MongoClient. For example:

# Each process creates its own instance of MongoClient.
def func():
    db = pymongo.MongoClient().mydb
    # Do something with db.

proc = multiprocessing.Process(target=func)
proc.start()

Never do this:

client = pymongo.MongoClient()

# Each child process attempts to copy a global MongoClient
# created in the parent process. Never do this.
def func():
  db = client.mydb
  # Do something with db.

proc = multiprocessing.Process(target=func)
proc.start()

Instances of MongoClient copied from the parent process have a high probability of deadlock in the child process due to inherent incompatibilities between fork(), threads, and locks. PyMongo will attempt to issue a warning if there is a chance of this deadlock occurring.

Upvotes: 4

Serge Ballesta
Serge Ballesta

Reputation: 148890

Not fork safe does not mean that you cannot call fork... It just mean that the child process should not use any inherited PyMongo instance. When you use subprocess.Popen, the newly forked child almost immediately calls exec to be replaced by a shell instance (shell = True) or the required executable (shell = False). So it is safe from a PyMongo point of view.

At the opposite when you call multiprocessing.Process, the child is indeed a copy of the parent and does keep its PyMongo instances. So it is unsafe to use PyMongo in that context, and the warning message was correctly issued

Upvotes: 6

ShadowRanger
ShadowRanger

Reputation: 155363

If you're able to move to Python 3.4 or higher, you could, prior to using pymongo, set your multiprocessing start method to 'forkserver'. That forks a fork server process immediately, and all future use of multiprocessing forks that fork server, not your main process. So once the fork server is set up, your main process can use pymongo, the fork server won't have used it, so it won't have issues reforking.

Sadly, start methods were only added in 3.4, so it's not an option for 2.7, but if someone else has this issue, it may be useful to them.

Upvotes: 3

Related Questions