Goodies
Goodies

Reputation: 4681

Why can Linux accept sockets in multiprocessing?

This code works fine on Linux, but fails under Windows (which is expected). I know that the multiprocessing module uses fork() to spawn a new process and the file descriptors owned by the parent (i.e. the opened socket) are therefore inherited by the child. However, it was my understanding that the only type of data you can send via multiprocessing needs to be pickleable. On Windows and Linux, the socket object is not pickleable.

from socket import socket, AF_INET, SOCK_STREAM
import multiprocessing as mp
import pickle

sock = socket(AF_INET, SOCK_STREAM)
sock.connect(("www.python.org", 80))
sock.sendall(b"GET / HTTP/1.1\r\nHost: www.python.org\r\n\r\n")

try:
    pickle.dumps(sock)
except TypeError:
    print("sock is not pickleable")

def foo(obj):
    print("Received: {}".format(type(obj)))
    data, done = [], False
    while not done:
        tmp = obj.recv(1024)
        done = len(tmp) < 1024
        data.append(tmp)
    data = b"".join(data)
    print(data.decode())


proc = mp.Process(target=foo, args=(sock,))
proc.start()
proc.join()

My question is why can a socket object, a demonstrably non-pickleable object, be passed in with multiprocessing? Does it not use pickle as Windows does?

Upvotes: 11

Views: 781

Answers (2)

Mike McKerns
Mike McKerns

Reputation: 35247

I think the issue is that multiprocessing uses a different pickler for Windows and non-Windows systems. On Windows, there is no real fork(), and the pickling that is done is equivalent to pickling across machine boundaries (i.e. distributed computing). On non-Windows systems, objects (like file descriptors) can be shared across process boundaries. Thus, pickling on Windows systems (with pickle) is more limited.

The multiprocessing package does use copy_reg to register a few object types to pickle, and one of those types is a socket. However, the serialization of the socket object that is used on Windows is more limited due to the Windows pickler being weaker.

On a related note, if you do want to send a socket object with multiprocessing on Windows, you can… you just have to use the package multiprocess, which uses dill instead of pickle. dill has a better serializer that can pickle socket objects on any OS, and thus sending the socket object with multiprocess works in either case.

dill has the function copy; essentially loads(dumps(object)) -- which is useful for checking an object can be serialized. dill also has check, which performs copy but with the more restrictive "Windows" style fork-like operation. This allows users on non-Windows systems to emulate a copy on a Windows system, or across distributed resources.

>>> import dill
>>> import socket
>>> s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
>>> s.connect(('www.python.org', 80))
>>> s.sendall(b'GET / HTTP/1.1\rnHost: www.python.org\r\n\r\n')
>>> 
>>> dill.copy(s)
<socket._socketobject object at 0x10e55b9f0>
>>> dill.check(s)
<socket._socketobject object at 0x1059628a0>
>>> 

In short, the difference is caused by the pickler that multiprocessing uses on Windows being different than the pickler it uses on non-Windows systems. However, it is possible (and easy) to have work on any OS by using a better serializer (as is used in multiprocess).

Upvotes: 2

mata
mata

Reputation: 69052

On unix platforms sockets and other file descriptors can be sent to a different process using unix domain (AF_UNIX) sockets, so sockets can be pickled in the context of multiprocessing.

The multiprocessing module uses a special pickler instance instead of a regular pickler, ForkingPickler, to pickle sockets and file descriptors which then can be unpickled in a different process. It's only possible to do this because it is known where the pickled instance will be unpickled, it wouldn't make sense to pickle a socket or file descriptor and send it between machine boundaries.

For windows there are similar mechanisms for open file handles.

Upvotes: 7

Related Questions