Eric Thomas
Eric Thomas

Reputation: 717

What is being pickled when I call multiprocessing.Process?

I know that multiprocessing uses pickling in order to have the processes run on different CPUs, but I think I am a little confused as to what is being pickled. Lets look at this code.

from multiprocessing import Process

def f(I):
    print('hello world!',I)

if __name__ == '__main__':
    for I in (range1, 3):
        Process(target=f,args=(I,)).start()

I assume what is being pickled is the def f(I) and the argument going in. First, is this assumption correct?

Second, lets say f(I) has a function call within in it like:

def f(I):
    print('hello world!',I)
    randomfunction()

Does the randomfunction's definition get pickled as well, or is it only the function call?

Further more, if that function call was located in another file, would the process be able to call it?

Upvotes: 11

Views: 5354

Answers (3)

dano
dano

Reputation: 94951

In this particular example, what gets pickled is platform dependent. On systems that support os.fork, like Linux, nothing is pickled here. Both the target function and the args you're passing get inherited by the child process via fork.

On platforms that don't support fork, like Windows, the f function and args tuple will both be pickled and sent to the child process. The child process will re-import your __main__ module, and then unpickle the function and its arguments.

In either case, randomfunction is not actually pickled. When you pickle f, all you're really pickling is a pointer for the child function to re-build the f function object. This is usually little more than a string that tells the child how to re-import f:

>>> def f(I):
...     print('hello world!',I)
...     randomfunction()
... 
>>> pickle.dumps(f)
'c__main__\nf\np0\n.'

The child process will just re-import f, and then call it. randomfunction will be accessible as long as it was properly imported into the original script to begin with.

Note that in Python 3.4+, you can get the Windows-style behavior on Linux by using contexts:

ctx = multiprocessing.get_context('spawn')
ctx.Process(target=f,args=(I,)).start()  # even on Linux, this will use pickle

The descriptions of the contexts are also probably relevant here, since they apply to Python 2.x as well:

spawn

The parent process starts a fresh python interpreter process. The child process will only inherit those resources necessary to run the process objects run() method. In particular, unnecessary file descriptors and handles from the parent process will not be inherited. Starting a process using this method is rather slow compared to using fork or forkserver.

Available on Unix and Windows. The default on Windows.

fork

The parent process uses os.fork() to fork the Python interpreter. The child process, when it begins, is effectively identical to the parent process. All resources of the parent are inherited by the child process. Note that safely forking a multithreaded process is problematic.

Available on Unix only. The default on Unix.

forkserver

When the program starts and selects the forkserver start method, a server process is started. From then on, whenever a new process is needed, the parent process connects to the server and requests that it fork a new process. The fork server process is single threaded so it is safe for it to use os.fork(). No unnecessary resources are inherited.

Available on Unix platforms which support passing file descriptors over Unix pipes.

Note that forkserver is only available in Python 3.4, there's no way to get that behavior on 2.x, regardless of the platform you're on.

Upvotes: 12

gdanezis
gdanezis

Reputation: 629

Only the function arguments (I,) and the return value of the function f are pickled. The actual definition of the function f has to be available when loading the module.

The easiest way to see this is through the code:

from multiprocessing import Process

if __name__ == '__main__':
    def f(I):
        print('hello world!',I)

    for I in [1,2,3]:
        Process(target=f,args=(I,)).start()

That returns:

AttributeError: 'module' object has no attribute 'f'

Upvotes: -1

Winston Ewert
Winston Ewert

Reputation: 45059

The function is pickled, but possibly not in the way you think of it:

You can look at what's actually in a pickle like this:

pickletools.dis(pickle.dumps(f))

I get:

 0: c    GLOBAL     '__main__ f'
12: p    PUT        0
15: .    STOP

You'll note that there is nothing in there correspond to the code of the function. Instead, it has references to __main__ f which is the module and name of the function. So when this is unpickled, it will always attempt to lookup the f function in the __main__ module and use that. When you use the multiprocessing module, that ends up being a copy of the same function as it was in your original program.

This does mean that if you somehow modify which function is located at __main__.f you'll end up unpickling a different function then you pickled in.

Multiprocessing brings up a complete copy of your program complete with all the functions you defined it. So you can just call functions. The entire function isn't copied over, just the name of the function. The pickle module's assumption is that function will be same in both copies of your program, so it can just lookup the function by name.

Upvotes: 3

Related Questions