Reputation: 11
Currently we have the following setup: A Postgresql DB "Master" and global distributed Postgresql Slaves, using Logical Replication for data replication. The master is read/write, the slaves are read-only. The master is high available through virtualization features. But if the whole datacenter fails, we have a problem.
Data Center 1 Global Slaves
+-----------+ +-----------+
| DB1 | | |
| Master DB |--------------------| Slave 1 |
| |-----\ | |
+-----------+ \ +-----------+
\
\
\
\ +-----------+
\ | |
--------| Slave 50 |
| |
+-----------+
Hence, I want to extend the setup with another Postgresql Slave in a second data center. On outage of data center 1, the Postgresql Slave in data center 2 will be promoted to the new master (read-write) and the global slaves will be reconfigured to the new master.
Data Center 1 Global Slaves
+-----------+ +-----------+
| DB1 | | |
| Master DB | /-------| Slave 1 |
| DOWN | / | |
+-----------+ / +-----------+
| /
| /
V /
+-----------+ / +-----------+
| New |-----/ | |
| Master DB |--------------------| Slave 50 |
| DB2 | | |
+-----------+ +-----------+
Data Center 2
If I understand it correct, regardless if I use streaming replication or logical replication between data center 1 and 2, the global slaves need to be re-synced from the new master. Is this correct? If not, how can I achieve a data center migration without needing to resync all global slaves?
If this is not possible by Postgresql means, would it help me to use addons like pgedge or pglogical? I think I could solve it by using Veeam Replication between data center 1 and 2, but that would probably introduce more data loss in case of an migration.
Reading the man page I came to the following conclusions:
Streaming Replication between data center (DC) 1 and 2: the replication slots are not replicated to the slave. Since PG16 I could subscribe the global slaves on DB2. Relaying the replication to the global slaves via data center 2 will prevent problems if DC1 is down, but will cause similar problems if DC2 is down. Resubscribing the global slaves to DB2 will cause re-sync, or if initial sync is disabled, the slaves will not be in sync anymore.
Logical Replication between data center (DC) 1 and 2: Here again the replication slots are not replicated to the slave. Further, DB2 will have a total different LSN and timeline, hence LSNs of DB1 can not be related to LSNs of DB2.
When using streaming replication between DB1 and DB2, and DB2 gets promoted. If I create new replication slots on DB2 with pg_create_logical_replication_slot(), and then on the global slaves change the host
parameter of the subscription's connection string, will the global slaves continue replication on the correct position?
Upvotes: 0
Views: 965
Reputation: 11
After lots of research the answer is: it is possible, and actually heavy discussed. In postgresql world they call it "Logical replication and physical standby failover" or "Failover Slots".
Basically, there must be a streaming replication "replica" which will be promoted as new primary in case the original primary. Further, the replication slots for the logical replication must exist on the replica before it gets promoted. This is difficult, as with Postgresql <16 it is not possible to create replication slots on a replica.
Hence, there exists "hacks" that copy the replication_slot files from the primary to the replica (ie. Patroni uses this technique). Nevertheless there can be several race conditions that can lead to inconsistent data on the logical replication subscriber, especially when the physical replica laged behind the logical replications subscribers, and that subscribers will be switched the the promoted replica (ie. see "A related provider/subscriber de-sync issue" in https://github.com/zalando/patroni/issues/1749).
In detailed discussion can be found in the Postgresql Wiki: https://wiki.postgresql.org/wiki/Logical_replication_and_physical_standby_failover https://wiki.postgresql.org/wiki/Failover_slots
Upvotes: 1