What really happened when socket recv/send data in Python

Question

I have read this from Socket Programming HOWTO in Python Documentation

you can transform your client socket into a file-like beast and use read and write. ... except to warn you that you need to use flush on sockets. These are buffered “files”, and a common mistake is to write something, and then read for a reply. Without a flush in there, you may wait forever for the reply, because the request may still be in your output buffer.

Now we come to the major stumbling block of sockets - send and recv operate on the network buffers

socket object in Python is a file descriptor, and you can use makefile() to get a file object associated with the socket.

According to the warning,

you need to use flush on sockets. These are buffered “files”...Without a flush in there, you may wait forever for the reply, because the request may still be in your output buffer...send and recv operate on the network buffers

I think when socket send/recv, there are actually two buffers: "file buffer" and "network buffer". If you transform socket to file like object and use write(data), first, data is written into "file output buffer", then into "network send buffer" by using flush. All this can explain the warning in the documentation: use flush after write or the read may block forever.

I drew a picture to show my opinion about underlying "two buffers" for socket.

socket transfer data model

So my question is how to understand the quote above? Is my "two buffers" model to understand right? Hope your reply, thank you!

Gil Hamilton · Accepted Answer

Yes, your model is largely correct. It may be helpful to understand that the "network buffers" being referred to in that quote reside in the operating system (i.e. not within the bounds of your process address space), while the "file buffers" are actually implemented within your process by the python runtime. This is why flush is required: the boundary between file buffers and network buffers in your diagram is essentially the operating system's "system call interface".

In other words, when you call socket.send, the data bytes from your buffer are transferred directly into the operating system network buffers (subject to space availability). They will then be sent to the network peer according to standard network mechanisms (TCP, etc). However, when you use makefile you are essentially building a buffering mechanism around this. When you write to the "file-like object", bytes are simply transferred to a hidden buffer maintained in association with the file (but still within your process' address space). Calling flush then does the equivalent of socket.send; moves those bytes into the operating system's buffers for transmission.

There are two cases where one would typically use makefile: (1) you have some other existing code that expects a file-like object that you wish to use in order to construct the bytes to be sent/received via the network, or (2) you want the buffering behavior, say, for performance reasons (of course, you could always implement such buffering yourself using str or bytes objects, but it is often more convenient to simply write to a file-like object).

What really happened when socket recv/send data in Python

Answers (1)

Related Questions