How to prevent windows error 10054 when unplugging network cable

I have a c++ application that uses a TCP socket to transfer a file from one processor to another. The applications will be running on an unreliable network so it is important for the transfers to continue when connectivity is lost and regained. I am using ACE to allow the application to run on Windows or Linux.

Currently, when I start a transfer and break the network connection between the two processors, if I reconnect it in less than about 20 seconds, the transfer picks back up and everything works fine. If the connection is not reestablished within 20 seconds, I get Windows error 10054 indicating the connection has been reset. At that point, the socket is gone and the transfer will not resume once connectivity is reestablished. Is there a way to override that so that I am in control of when the connection gets timed out?

Edit: This seems to be a Windows issue. I tried sending a file from a Linux VM to a Windows box. I disconnected the network cable for over 5 minutes during the transfer. When I reconnected it, the transfer picked up right where it left off and completed.

Upvotes: 1

Views: 1865

Answers (2)

selbie
selbie

Reputation: 104474

While there are probably some socket options you could experiment with, the right solution is likely this:

Build reconnect logic into whatever protocol you have implemented on top of TCP.

You will always have socket disconnects to deal with, even if you glue the network cable into the port. So you might as well just focus on making your sockets periodically try to "reconnect" and then have your protocol aware of handling this situation. (e.g try to reconnect every N seconds, possibly waiting longer each time until some max timeout). And then for the protocol specific changes, I would work off the premise that the receiver should dictate to the sender from what point to start transferring data. Similar to how a web browser tells an http server how to resume a file transfer.

Upvotes: 0

tomasz
tomasz

Reputation: 13052

I believe the subject should say How to handle, not How to prevent, right? You definitely want to get this error, as (per your comment) at that point the file transfer would be in a suspended state and the user would be aware of this as well. In order to suspend and inform the user you need to get an error.

Anyhow, the 20 seconds you mentioned is probably due to a timeout in your OS/router. The number can vary significantly and you shouldn't rely on it in any way. You could try to update the timeout on every single box on your path, but that's usually not possible and doesn't really solve you problem (you can always lose connectivity for longer than timeout you have set).

In order to build a solution which is immune to timeouts, you need a simple protocol on top of your raw connection and allow to reconnect and resume transfer from specific offset of your data stream. You could modify your client to send the request with details about retransmission point.

If your network is really unreliable and breaks often, you could switch to UDP. Some packets will arrive, some will be missing. You can collect the blocks and request retransmission for parts of data that haven't arrived. You'll probably need to spend a little bit more time on designing the right protocol, but the solution may be superior to standard TCP.

Upvotes: 2

Related Questions