Ramanan
Ramanan

Reputation: 591

Replication on Postgresql pauses when Querying and replication are happening simultaneously

Postgress follows MVCC rules. So any query that is run on a table doesn't conflict with the writes that happen on the table. The query returns the result based on the snapshot at the point of running the query.

Now i have a master and slave. The slave is used by analysts to run queries and to perform analysis. When the slave is replicating and when analyst are running their queries simultaneously, i can see the replication lag for a long time.If the queries are long running, the replication lags a long duration and if the number of writes on the master happens to be pretty high, then i end up losing the WAL files and replication can longer proceed. I just have to spin up another slave. Why does this happen ? How do i allow queries and replication to happen simultaneously on postures ? Is there any parameter setting that i can apply to make this happen ?

Upvotes: 6

Views: 3980

Answers (1)

Craig Ringer
Craig Ringer

Reputation: 324265

The replica can't apply more WAL from the master because the master might've overwritten data blocks still needed by queries running on the replica that're older than any still running on the master. The replica needs older row versions than the master. It's exactly because of MVCC that this pause is necessary.

You probably set a high max_standby_streaming_delay to avoid "canceling statement due to conflict with recovery" errors.

If you turn hot_standby_feedback on, the replica can instead tell the master to keep those rows. But the master can't clean up free space as efficiently then, and it might run out of space in pg_xlog if the standby gets way too far behind.

See PostgreSQL manual: Handling Query Conflicts.

As for the WAL retention part: enable WAL archiving and a restore_command for your standbys. You should really be using it anyway, for point-in-time recovery. PgBarman now makes this easy with the barman get-wal command. If you don't want WAL archiving you can instead set your replica servers up to use a replication slot to connect to the master, so the master knows to retain the WAL they need indefinitely. Of course, that can cause the master to run out of space in pg_xlog and stop running so you need to monitor more closely if you do that.

Upvotes: 3

Related Questions