Reputation: 627
Current Situation: Currently I have dozens of sites that send html form data to a collection server. This collection sever then resends the data on to a processing server later. Having the processing server go down is not a big deal, but losing form data means losing my job.
Goal: I want to ensure there is no single point of failure that would stop html form data from being collected.
Possible Solution: My though was to have 3 servers and then send the html form data to each of them from the websites. I would want some way to ensure that only one copy of the lead was passed from the collection servers on to the processing server.
#Users fill Form Data It is Captured Redundantly And processed here
website01 -> collectionServer01 -> processingServer
website06 collectionServer02
website24 collectionServer03
website#N
I think this is called a distributed queue??
Question:Assuming this is a distributed queue I am describing, is that a good way to meet the goal I have? Are there other ways people have used? How would you recommend ensuring only one copy gets sent from the collectionServers to the processingServer?
Upvotes: 0
Views: 79
Reputation: 150138
If I understand your question correctly, you have something like this
Some Website
Another Website Intake Server Processing Server
(reliable) (unreliable)
Yet Another Website
(Customer?) leads flow from many different websites to your Intake Server, and then are forwarded along to the Processing Server. You are concerned about your Intake Server going down, because that is what you are responsible for keeping up.
The classic solution to this problem is to have 2 or more Intake Servers behind a load balancer, and to have a Master and at least one Slave database.
To avoid the risk of losing your service if you lose a data center (remember the Tsunami in Japan?) is to run your setup in multiple data centers, and use geographic load balancing to send traffic to the nearest data center or, if it fails, to one of the other data centers.
In that case, you would want to replicate all data between the various data centers (e.g. Master/Master database, with local slaves for redundancy, or Master in Data Center A plus Slave in Data Center A plus Slave of Master A in Data Center B, etc.).
I successfully used that arrangement on several occasions. There are services that manage geo load balancing in a very reliable manner (though they are not exactly cheap).
If an Intake Server goes down, the load balancer detects this condition and routes traffic to the remaining Intake Servers. If the Master database goes down, you switch to the Slave database and recover the Master.
For load balancing, here's some general information. I have had great experience using both NGinX and HAProxy as load balancers.
If you send all data to all data centers, the task of coordinating which data center sent which lead to the Processing Server is very non-trivial when you consider that you may lose one or more data centers (how do you know which leads it sent before it went down? How do you decide which data center should send which lead?). Even if you have one "Master" data center and one "Hot Stand-By" data center, it is not trivial to know where the "Hot Stand-By" needs to take up work if the "Master" goes down, if they do not constantly sync state as they would with e.g. a replicated database solution.
One of the commenters mentioned (a few times) that one can use a distributed queue to solve this problem. That is also a viable route, but one that I have less experience with than the solution I described.
Upvotes: 2