Reputation: 3637
I am writing a socket program in c++. The program runs on a set of cluster machines.
I just entered into the socket programming and just learned how to send and receive. I think that, during the long running of the program, some TCP connections can get lost. In that case, re-connecting the server and client smoothly is necessary.
I wonder if there is a well-known basic mechanism (or algorithm? protocol?) to achieve it. I found that there are many many socket error codes with different semantics, which makes me hard to start.
Can any one suggest any reference code that I can learn from?
Thanks,
Upvotes: 0
Views: 2074
Reputation: 118292
The actual, specific error code, is irrelevant. If you have an active socket connection, a failed read or a write indicates that the connection is gone. The error code perhaps gives you some explanation, but it's a bit too late now. The socket is gone. It is no more. It ceased to exist. It's an ex-socket. You can use the error code to come up with a colorful explanation, but it would be little more than some minor consolation. No matter what was the specific reason, but your socket is gone and you have to deal with it.
When using non-blocking sockets there are certain specific return codes and errno
values that indicate that the socket is still fine, but just is not ready to read or write anything, that you'll have to specifically check for, and handle. This would be the only exception to this.
Also, EINTR
usually does not necessarily mean that the socket is really broken; so that might be another exception to check for.
Once you have a broken socket, the only general design principle, if there is one, is that you have to close()
it as the first order of business. The file descriptor is completely useless. After that point, it's entirely up to you what to do next. There are no rules, etched in stone, for this situation. Typically, applications would log an error, in some form or fashion, or attempt to make another connection. It's generally up to you to figure out what to do.
About the only "well-known basic mechanism" in socket programming is explicit timeouts. Network errors, and failures, don't always get immediately detected by the underlying operating system. When a networking problem occurs, it is not always immediately detectable. It can take many minutes before the protocol stack declares a broken socket, and gives you an error indication.
So, if you're coding a particular application, and you know that you should expect to read or write something within some prescribed time frame, a common design pattern is to code an explicit timeout, and if nothing happens when the timeout expires, assume that the socket is broken -- even if you have no explicit error indication otherwise -- close()
it, then proceed to the next step.
Upvotes: -1
Reputation: 310860
It's not complicated. The only two error codes that aren't fatal to the connection are:
select()/poll()/epoll()
has so indicated;All others are fatal to the connection and should cause to you close it.
Upvotes: 3