Reputation: 21
I'm trying to figure out the right pieces to implement a highly available & failover setup for a C based server application. The TCP connections would ideally be up for days. If master Server goes down due to a uncontrolled network problem then Standby server will act as Master with TCP connections shifted to this Server.
The data within the socket connections looks very similar to protobuf data structures. It's not HTTP.
Thus far I've been looking at keepalived and HAProxy, but neither seem to allow redirecting/failovering a persistent TCP session to a different Standby server without disconnecting the session.
What I'm looking for is If Master server goes down then Standby server will handle all the Clients with TCP connections without disconnecting TCP session.
Master and Standby will have virtual IP using keepalived.
| VIP |
+----+----+ +----+----+
| Master | <-VRRP-> | Standby |
+----+----+ +----+----+
| |
| |
------+---+------------+---+----------
|
+---+---+
|Client |
+---+---+
What options are there for shifting TCP Connections or Synchronizing TCP Sessions between Master and Standby Server running RHEL7.1. So that Client Connected cannot know whether Master server goes down and Standby Server has become Master Server.
Thanks!
Upvotes: 2
Views: 3122
Reputation: 2448
As others have pointed out, true TCP connection mirroring and failover is not easy and is something that needs to be done on the kernel level. No user-land process will be able to do this for you.
In a past life I even implemented this feature for a commercial load balancer application. Predictably, it needed kernel modifications and wasn't trivial.
Yes, tcpcp was another project for this, but seems to be mostly abandoned.
As others have asked: Are you sure you need this? I would strongly recommend to re-architect the overall application in such a manner that clients can deal with a connection drop by re-trying. If this is not possible then consider an overall architecture where the probability of failure is reduced to such a point that an extremely rare loss of connections simply doesn't matter.
You are trying to achieve 100% uptime for your TCP connections. Consider, however, that this is most likely not achievable anyway, since any number of other components may also fail (power, upstream routers, etc.)
Therefore, you need to design this for possible failure anyway.
Upvotes: 1