Reputation: 32745

Proper order for closing TcpListener and TcpClient connections (which side should be the active close)

I read this answer on a previous question which says:

So the peer that initiates the termination – i.e. calls close() first – will end up in the TIME_WAIT state. [...]

However, it can be a problem with lots of sockets in TIME_WAIT state on a server as it could eventually prevent new connections from being accepted. [...]

Instead, design your application protocol so the connection termination is always initiated from the client side. If the client always knows when it has read all remaining data it can initiate the termination sequence. As an example, a browser knows from the Content-Length HTTP header when it has read all data and can initiate the close. (I know that in HTTP 1.1 it will keep it open for a while for a possible reuse, and then close it.)

I'd like to implement this using TcpClient/TcpListener, but it's not clear how to make it work properly.

Approach 1: both sides close

This is the typical way most MSDN examples illustrate - both sides calling Close(), not just the client:

private static void AcceptLoop()
{
    listener.BeginAcceptTcpClient(ar =>
    {
        var tcpClient = listener.EndAcceptTcpClient(ar);

        ThreadPool.QueueUserWorkItem(delegate
        {
            var stream = tcpClient.GetStream();
            ReadSomeData(stream);
            WriteSomeData(stream);
            tcpClient.Close();   <---- note
        });

        AcceptLoop();
    }, null);
}

private static void ExecuteClient()
{
    using (var client = new TcpClient())
    {
        client.Connect("localhost", 8012);

        using (var stream = client.GetStream())
        {
            WriteSomeData(stream);
            ReadSomeData(stream);
        }
    }
}

After I run this with 20 clients, TCPView shows a lot of sockets from both the client and server stuck in TIME_WAIT, which take quite some time to disappear.

enter image description here

Approach 2: just client close

As per the quotes above, I removed the Close() calls on my listener, and now I just rely on the client closing:

var tcpClient = listener.EndAcceptTcpClient(ar);

ThreadPool.QueueUserWorkItem(delegate
{
    var stream = tcpClient.GetStream();
    ReadSomeData(stream);
    WriteSomeData(stream);
    // tcpClient.Close();   <-- Let the client close
});

AcceptLoop();

Now I no longer have any TIME_WAIT, but I do get sockets left in various stages of CLOSE_WAIT, FIN_WAIT, etc. that also take a very long time to disappear.

TCPView, with everything closing

Approach 3: give client time to close first

This time I added a delay before closing the server connection:

var tcpClient = listener.EndAcceptTcpClient(ar);

ThreadPool.QueueUserWorkItem(delegate
{
    var stream = tcpClient.GetStream();
    ReadSomeData(stream);
    WriteSomeData(stream);
    Thread.Sleep(100);      // <-- Give the client the opportunity to close first
    tcpClient.Close();      // <-- Now server closes
});

AcceptLoop();

This seems to be better - now only client sockets are in TIME_WAIT; the server sockets have all closed properly:

enter image description here

This seems to agree with what the previously linked article says:

So the peer that initiates the termination – i.e. calls close() first – will end up in the TIME_WAIT state.

Questions:

Which of these approaches is the right way to go, and why? (Assuming I want the client to be the 'active close' side)
Is there a better way to implement approach 3? We want the close to be initiated by the client (so that the client is left with the TIME_WAITs), but when the client closes, we also want to close the connection on the server.
My scenario is actually opposite to a web server; I have a single client that connects and disconnects from many different remote machines. I'd rather the server have connections stuck in TIME_WAIT instead, to free up resources on my client. In this case, should I make the server perform the active close, and put the sleep/close on my client?

Full code to try it yourself is here:

https://gist.github.com/PaulStovell/a58cd48a5c6b14885cf3

Edit: another useful resource:

http://www.serverframework.com/asynchronousevents/2011/01/time-wait-and-its-design-implications-for-protocols-and-scalable-servers.html

For a server that does establish outbound connections as well as accepting inbound connections then the golden rule is to always ensure that if a TIME_WAIT needs to occur that it ends up on the other peer and not the server. The best way to do this is to never initiate an active close from the server, no matter what the reason. If your peer times out, abort the connection with an RST rather than closing it. If your peer sends invalid data, abort the connection, etc. The idea being that if your server never initiates an active close it can never accumulate TIME_WAIT sockets and therefore will never suffer from the scalability problems that they cause. Although it's easy to see how you can abort connections when error situations occur what about normal connection termination? Ideally you should design into your protocol a way for the server to tell the client that it should disconnect, rather than simply having the server instigate an active close. So if the server needs to terminate a connection the server sends an application level "we're done" message which the client takes as a reason to close the connection. If the client fails to close the connection in a reasonable time then the server aborts the connection.

On the client things are slightly more complicated, after all, someone has to initiate an active close to terminate a TCP connection cleanly, and if it's the client then that's where the TIME_WAIT will end up. However, having the TIME_WAIT end up on the client has several advantages. Firstly if, for some reason, the client ends up with connectivity issues due to the accumulation of sockets in TIME_WAIT it's just one client. Other clients will not be affected. Secondly, it's inefficient to rapidly open and close TCP connections to the same server so it makes sense beyond the issue of TIME_WAIT to try and maintain connections for longer periods of time rather than shorter periods of time. Don't design a protocol whereby a client connects to the server every minute and does so by opening a new connection. Instead use a persistent connection design and only reconnect when the connection fails, if intermediary routers refuse to keep the connection open without data flow then you could either implement an application level ping, use TCP keep alive or just accept that the router is resetting your connection; the good thing being that you're not accumulating TIME_WAIT sockets. If the work that you do on a connection is naturally short lived then consider some form of "connection pooling" design whereby the connection is kept open and reused. Finally, if you absolutely must open and close connections rapidly from a client to the same server then perhaps you could design an application level shutdown sequence that you can use and then follow this with an abortive close. Your client could send an "I'm done" message, your server could then send a "goodbye" message and the client could then abort the connection.

Upvotes: 18

Answers (3)

Jason Stangroome

Reputation: 4489

Paul, you've done some great research yourself. I've been working this area a bit too. One article I've found very useful on the topic of TIME_WAIT is this:

http://vincent.bernat.im/en/blog/2014-tcp-time-wait-state-linux.html

It has a few Linux-specific concerns but all the TCP-level content is universal.

Ultimately both sides should close (ie complete the FIN/ACK handshake) as you don't want FIN_WAIT or CLOSE_WAIT states lingering, this is just "bad" TCP. I'd avoid using RST to force-close connections as that's likely to cause problems elsewhere and just feels like being a poor netizen.

It is true that the TIME_WAIT state will happen on the end that terminates the connection first (ie sends the first FIN packet) and that you should optimize to close the connection first on the end that will have the least connection churn.

On Windows you'll have just over 15,000 TCP ports available per IP by default so you'd need decent connection churn to hit that. The memory for the TCBs to track the TIME_WAIT states should be quite acceptable.

https://support.microsoft.com/kb/929851

It's also important to note that a TCP connection can be half-closed. That is, one end can choose to close the connection for sending but leave it open for receiving. In .NET this is done like this:

tcpClient.Client.Shutdown(SocketShutdown.Send);

http://msdn.microsoft.com/en-us/library/system.net.sockets.socket.shutdown.aspx

I found this necessary when porting part of the netcat tool from Linux to PowerShell:

http://www.powershellmagazine.com/2014/10/03/building-netcat-with-powershell/

I must re-iterate the advice that if you can keep a connection open and idle until you need it again, this typically has a massive impact on reducing the TIME_WAITs.

Beyond that, try measure when the TIME_WAITs become a problem ... it really takes a lot of connection churn to exhaust the TCP resources.

I hope some of this is helpful.

Upvotes: 3

Luaan

Reputation: 63772

This is just how TCP works, you can't avoid it. You can set different timeouts for the TIME_WAIT or FIN_WAIT on your server, but that's about it.

The reason for this is that on TCP, a packet can arrive to a socket you've closed a long time ago. If you already have another socket open on the same IP and port, it would receive data meant for the previous session, which would confuse the hell out of it. Especially given that most people consider TCP to be reliable :)

If both your client and server implement TCP properly (for example, handling the clean shutdown correctly), it doesn't really matter whether the client or the server closed the connection. Since it sounds like you manage both sides, it shouldn't be an issue.

Your issue seems to be with the proper shutdown on the server. When one side of the socket closes, the other side will Read with a length of 0 - that's your message that the communication is over. You're most likely ignoring this in your server code - it's a special case that says "you can now safely dispose of this socket, do it now".

In your case, a server-side close seems like the best fit.

But really, TCP is rather complicated. It doesn't help that most of the samples on the internet are severely flawed (especially the samples for C# - it's not too hard to find a good sample for C++, for example) and ignore many of the important parts of the protocol. I've got a simple sample of something that might work for you - https://github.com/Luaancz/Networking/tree/master/Networking%20Part%201 It still isn't perfect TCP, but it's much better than the MSDN samples, for example.

Upvotes: 4

aevitas

Reputation: 3833

Which of these approaches is the right way to go, and why? (Assuming I want the client to be the 'active close' side)

In the ideal situation, you would have your server send a RequestDisconnect packet with a certain opcode to the client, which the client then handles by closing the connection upon receiving that packet. That way, you don't end up with stale sockets on the server's side (since sockets are resources, and resources are finite, so having stales is a bad thing).

If the client then executes its disconnect sequence by disposing the socket (or calling Close() if you're using a TcpClient, it will put the socket in the CLOSE_WAIT state on the server, which means the connection is in the process of being closed.

Is there a better way to implement approach 3? We want the close to be initiated by the client (so that the client is left with the TIME_WAITs), but when the client closes, we also want to close the connection on the server.

Yes. Same as above, have the server send a packet to request the client to close its connection to the server.

My scenario is actually opposite to a web server; I have a single client that connects and disconnects from many different remote machines. I'd rather the server have connections stuck in TIME_WAIT instead, to free up resources on my client. In this case, should I make the server perform the active close, and put the sleep/close on my client?

Yes, if that's what you're after feel free to call Dispose on the server to get rid of the clients.

On a side note, you may want to look into using raw Socket objects rather than the TcpClient, as it's extremely limited. If you operate on sockets directly, you have SendAsync and all the other async methods for socket operations at your disposal. Using manual Thread.Sleep calls is something I'd avoid - these operations are asynchronous in nature - disconnecting after you've written something to a stream should be done in the callback of SendAsync rather than after a Sleep.

Upvotes: 1