Reputation: 4681
This code works fine on Linux, but fails under Windows (which is expected). I know that the multiprocessing module uses fork()
to spawn a new process and the file descriptors owned by the parent (i.e. the opened socket) are therefore inherited by the child. However, it was my understanding that the only type of data you can send via multiprocessing needs to be pickleable. On Windows and Linux, the socket object is not pickleable.
from socket import socket, AF_INET, SOCK_STREAM
import multiprocessing as mp
import pickle
sock = socket(AF_INET, SOCK_STREAM)
sock.connect(("www.python.org", 80))
sock.sendall(b"GET / HTTP/1.1\r\nHost: www.python.org\r\n\r\n")
try:
pickle.dumps(sock)
except TypeError:
print("sock is not pickleable")
def foo(obj):
print("Received: {}".format(type(obj)))
data, done = [], False
while not done:
tmp = obj.recv(1024)
done = len(tmp) < 1024
data.append(tmp)
data = b"".join(data)
print(data.decode())
proc = mp.Process(target=foo, args=(sock,))
proc.start()
proc.join()
My question is why can a socket
object, a demonstrably non-pickleable object, be passed in with multiprocessing? Does it not use pickle as Windows does?
Upvotes: 11
Views: 781
Reputation: 35247
I think the issue is that multiprocessing
uses a different pickler for Windows and non-Windows systems. On Windows, there is no real fork()
, and the pickling that is done is equivalent to pickling across machine boundaries (i.e. distributed computing). On non-Windows systems, objects (like file descriptors) can be shared across process boundaries. Thus, pickling on Windows systems (with pickle
) is more limited.
The multiprocessing
package does use copy_reg
to register a few object types to pickle
, and one of those types is a socket
. However, the serialization of the socket
object that is used on Windows is more limited due to the Windows pickler being weaker.
On a related note, if you do want to send a socket
object with multiprocessing
on Windows, you can… you just have to use the package multiprocess
, which uses dill
instead of pickle
. dill
has a better serializer that can pickle socket
objects on any OS, and thus sending the socket
object with multiprocess
works in either case.
dill
has the function copy
; essentially loads(dumps(object))
-- which is useful for checking an object can be serialized. dill
also has check
, which performs copy
but with the more restrictive "Windows" style fork-like operation. This allows users on non-Windows systems to emulate a copy
on a Windows system, or across distributed resources.
>>> import dill
>>> import socket
>>> s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
>>> s.connect(('www.python.org', 80))
>>> s.sendall(b'GET / HTTP/1.1\rnHost: www.python.org\r\n\r\n')
>>>
>>> dill.copy(s)
<socket._socketobject object at 0x10e55b9f0>
>>> dill.check(s)
<socket._socketobject object at 0x1059628a0>
>>>
In short, the difference is caused by the pickler that multiprocessing
uses on Windows being different than the pickler it uses on non-Windows systems. However, it is possible (and easy) to have work on any OS by using a better serializer (as is used in multiprocess
).
Upvotes: 2
Reputation: 69052
On unix platforms sockets and other file descriptors can be sent to a different process using unix domain (AF_UNIX) sockets, so sockets can be pickled in the context of multiprocessing.
The multiprocessing module uses a special pickler instance instead of a regular pickler, ForkingPickler, to pickle sockets and file descriptors which then can be unpickled in a different process. It's only possible to do this because it is known where the pickled instance will be unpickled, it wouldn't make sense to pickle a socket or file descriptor and send it between machine boundaries.
For windows there are similar mechanisms for open file handles.
Upvotes: 7