Reputation: 4198
I am developing a 9p server, it is pretty much like an nfs server. Subsequent mounting and unmounting causes no socket descriptor file leakage because I am able to close the socket. However, in the following scenario, the server does not do a proper clean up and close the socket. The scenario is, when the client at Machine A mount a FS from the server machine. Then for some reason, Machine A restarts or is shut down. If this happens, I am expecting the server to clean up the work and close the socket but for some reason it blocks on read(). I thought a read() should return 0 when a connection is closed but it doesnt. I assume thats because a proper tcp termination has not occured so the server is waiting for some data from the client. Here is a pseudo code of my server
while(1){
n = read(sockfd, buffer, 4); //4 is protocol header that specifies the size
if ( n == 0 ) break;
/* iteratively read the rest of bytes until the incoming message ends */
}
cleanup(); // close socket and some other tasks
However, when the client restarts while the server is blocking on read, nothing happens. What is the best way and easiest to solve this? Some people suggest running a separate thread that checks connections but this is too involved. I am sure there must be a faster way
Upvotes: 2
Views: 332
Reputation: 171178
I want to amend Zaboj Campula's good answer with the most important way to deal with this: Timeouts. Normally, you would assign a timeout to any socket operation. A typical value is 30 seconds. That way there is no need for keep alives most of the time. Connection failure will be detected within 30 seconds.
Some people suggest running a separate thread that checks connections but this is too involved.
That does not work because your machine does not know that the connection is gone. There is nothing to check.
Upvotes: 0
Reputation: 3360
When the client does a shutdown then the OS on client terminates all TCP connection. But when the client crashes or it is switched off or when an network problem occurs somewhere at path between the client and the server then there is no way to deliver an information to server and the server may be blocked in the read()
call forever.
There are two possible solutions. Either you can use standard TCP keep alive probes or you can implement an application level health-check.
TCP keep-alive is well described for example at http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/overview.html :
In order to understand what TCP keepalive (which we will just call keepalive) does, you need do nothing more than read the name: keep TCP alive. This means that you will be able to check your connected socket (also known as TCP sockets), and determine whether the connection is still up and running or if it has broken...
When you want you application use TCP keep alive the just set the socket option (error checking is missing):
int optval = 1;
socklen_t optlen = sizeof(optval);
setsockopt(socket, SOL_SOCKET, SO_KEEPALIVE, &optval, optlen);
The TCP keep alive is easy to use but it depends on the OS configuration and application cannot set own timeouts because they are configurable system wide.
Use an application level mechanism when you need application specific timeouts for disconnection detection. There are plenty of ways how to implement it. The idea is to send periodically a piece of useless data and assume connection is destroyed when it is not received.
Upvotes: 2