user2813274
user2813274

Reputation: 858

HA Database configuration that avoids split-brain issues?

I am looking for a (SQL/RDB) database setup that works something like this:

  1. I will have 3+ databases in an active/active/active configuration
  2. prior to doing any insert, the database will communicate with atleast a majority of the others, such that they all either insert at the same time or rollback (transaction)
  3. this way I can write and read from any of the databases, and always get the same results (as long as the field wasn't updated very recently)

note: this is for a use case that will be very read-heavy and have few writes (and delay on the writes is an OK situation)

does anything like this exist? I see all sorts of solutions with database HA configurations, but most of them suggest writing to a primary node or having a passive backup

alternatively I could setup a custom application, and have each application talk to exactly 1 database, and achieve a similar result, but I was hoping something similar would already exist

So my questions is: does something like this exist? if not, are there any technical/architectural reasons why not?

P.S. - I will NOT be using a SAN where all databases can store/access the same data

edit: more clarifications as far as what I am looking for: 1. I have no database picked out yet, but I am more familiar with MySQL / SQL Server / Oracle, so I would have a minor inclination towards on of those 2. If a majority of the nodes are down (or a single node can't communicate with the collective), then I expect all writes from that node to fail, and accept that it may provide old/outdated information

failure / recover scenario expectations: 1. A node goes down: it will query and get updates from the other nodes when it comes back up 2. A node loses connection with the collective: it will provide potentially old data to read request, and refuse any writes 3. A node is in disagreement with the data stores in others: majority rule 4. 4. majority rule does not work: go with whomever has the latest data (although this really shouldn't happen) 5. The entries are insert/update/read only, i.e. there will be no deletes (except manually ofc), so I shouldn't need to worry about an update after a delete, however in that case I would choose to delete the record and ignore the update 6. Any other scenarios I missed?

update: I the closest I see to what I am looking for seems to be using a quorum + 2 DBs, however I would prefer if I could have 3 DBs instead, such that I can query any of them at any time (to further distribute the reads, and also to keep another copy of the data)

Upvotes: 0

Views: 250

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269973

You need to define "very recently". In most environments with replication for inserts, all the databases will have the same data within a few seconds of an insert (and a few seconds seems pessimistic).

An alternative approach is a "read-from-one/write-to-all" approach. In this case, reads are spread through the system. Writes are then sent to all nodes by the application (or a common layer that the application uses).

Remember, though, that the issue with database replication is not how it works when it works. The issue is how it recovers when it fails and even how failures are identified. You need to decide what happens when nodes go down, how they recover lost transactions, how you decide that nodes are really synchronized. I would suggest that you peruse the documentation of the database that you are actually using and understand the replication mechanisms provided by that platform.

Upvotes: 1

Related Questions