Reputation: 3637
I program distributed program in C++ that uses TCP and run it on linux cent os 7 with kernel 3.1.0
The program is built for high performance with high CPU, disk and network usage.
The program might run over a few days like 4 days. I am worried about the case where TCP connection is lost during the computation for any reason except for the case that one of machines died.
Can this happen? (The tcp connection is lost while the machines are all alive and no one invoked close on the socket?)
If possible, what can the programmer like me do for it? Can I detect the lost connection and try to reconnect it?
Thanks,
Upvotes: 0
Views: 557
Reputation: 14392
Ideally, connection management is part of the protocol. This way the management is documented and client and server know what is expected.
Some strategies:
Upvotes: 1