Peter B
Peter B

Reputation: 493

Is python multiprocessing with inter-process signalling via a global flag variable safe?

I'm running many subprocesses (more than I have cores) and, if one meets a certain condition, I set a value of a global variable global bailout.

Then if bailout was set, all subsequent subprocesses exit as quickly as possible.

See e.g. this simple example, where I multiply the results of my 20 calls to a loop() function, but I "bail out" if any of those calls returns zero:

import sys
import random
import multiprocessing

def loop(tup):
    global bailout
    if bailout==1:                    # obey a global bail out "flag"
        return 0
    x = random.random() - 0.5
    if x < 0:
        bailout = 1                   # set a global bail out "flag"
        return 0
    return x

def top():
    global bailout
    bailout = 0
    runtups = 20 * [[0]]              # a dummy parameter [0] for function "loop"
    pool = multiprocessing.Pool()
    results = pool.imap(loop, runtups)
    pool.close()
    res = 1
    sys.stdout.write("1")
    for result in results:
        sys.stdout.write(" * %g" % result)
        res = res * result
    sys.stdout.write(" = %g\n" % res)

top()

It works just fine (or to be precise, it has worked every time I've tried it). i.e. my desktop has 4 cores, and if one of the first 4 subprocesses sets bailout to 1 (as almost always happens in this example), then all subsequent runs exit on the if bailout==1 condition.

But is it safe?

I mean, all a subprocess can ever do is set bailout to 1. But what if two subprocesses both want to set bailout to 1? Is it possible for them to attempt it at the same time, causing bailout to become undefined? Or is it guaranteed that this will never happen, (perhaps because the top level process always handles the completed subprocesses serially?)

Upvotes: 1

Views: 718

Answers (2)

user3666197
user3666197

Reputation: 1

Is it possible?
Is it safe?
Is it guaranteed?

While GIL-stepping indeed makes all the threads-based ( not the subprocesses-based ) multiprocessing efforts to still appear in a pure-[SERIAL] processing-flow, the question is more about a principal approach and whether all the above raised issues are safely satisfied.


Rather do not try to go against documented pieces of advice:

Best let's mention the explicit statements from the documentation:

16.6.3. Programming guidelines

...

Explicitly pass resources to child processes

On Unix a child process can make use of a shared resource created in a parent process using a global resource. However, it is better to pass the object as an argument to the constructor for the child process.

Apart from making the code (potentially) compatible with Windows this also ensures that as long as the child process is still alive the object will not be garbage collected in the parent process. This might be important if some resource is freed when the object is garbage collected in the parent process.


and

16.6.3.2 Windows

...

Global variables

Bear in mind that if code run in a child process tries to access a global variable, then the value it sees (if any) may not be the same as the value in the parent process at the time that Process.start was called.

However, global variables which are just module level constants cause no problems.


Besides the native pythonic tools to help communicate or "share" state ( which not only I advocate, where possible, to better never share ), there are smart tools for designing indeed , using multi-agent concepts, where each thread may use other lightweight communication tools, being performance less penalised, than the native GIL-stepped operations allow ( ref. ZeroMQ, nanomsg et al ).

Upvotes: 0

Jim Stewart
Jim Stewart

Reputation: 17323

Globals are not shared between processes. If you add some logging to loop, you can see what's really going on:

def loop(tup):
    global bailout
    if bailout==1:
        print(f'pid {os.getpid()} had bailout 1')
        return 0
    x = random.random() - 0.5
    if x < 0:
        print(f'pid {os.getpid()} setting bailout 1')
        bailout = 1
        return 0
    return x

This will produce output like:

pid 30011 setting bailout 1
pid 30013 setting bailout 1
pid 30015 setting bailout 1
pid 30009 setting bailout 1
pid 30010 setting bailout 1
pid 30011 had bailout 1
pid 30013 had bailout 1
pid 30009 had bailout 1
pid 30014 setting bailout 1
pid 30015 had bailout 1
pid 30010 had bailout 1
pid 30011 had bailout 1
1 * 0.494123 * 0.0704172 * 0 * 0.10829 * 0 * 0.465238 * 0 * 0.0638724 * 0 * 0 * 0 * 0.227231 * 0 * 0 * 0 * 0 * 0 * 0 * 0.463628 * 0.372984 = 0

What's happening is that multiprocessing.Pool() is starting 4 processes, which are being re-used as they become available. So while processing the 20 items in runtups, eventually each individual process has its bailout set to 1. When that process is reused, it triggers the bailout clause.

Since you're randomly deciding when to set bailout = 1, it's possible for it to never happen while processing the 20 items, or it may happen in some processes but not others, so you may not get the same results I pasted above, but at least some of the processes are likely to go into bailout mode.

If you're looking for a reliable way to share state between processes, check out https://docs.python.org/3/library/multiprocessing.html#sharing-state-between-processes.

Upvotes: 2

Related Questions