Crag
Crag

Reputation: 1851

Monitor postgres logical replication

Our logical replication recently stopped because of key conflicts - the wals were coming down but weren't being applied. I'd like to setup some monitoring so I can get notified when this happens, but I wasn't sure which fields to use.

I think I should be able to use the pg_stat_subscription view, but I wasn't sure from its description exactly what its columns represent.

Can I compare the source's pg_current_wal_lsn() to the replica's pg_stat_subscription.received_lsn to make sure the data is coming down?

received_lsn    pg_lsn  Last write-ahead log location received, the initial value of this field being 0

And would comparing the replica's pg_stat_subscription.received_lsn to its pg_stat_subscription.latest_end_lsn let me know if the data is getting replayed?

latest_end_lsn  pg_lsn  Last write-ahead log location reported to origin WAL sender

It's that last question which I'm most unsure about - does 'reported to origin WAL sender' imply that the replication was replayed successfully on the replica? Or is there some other field I'd need to use to know how far the replica has successfully applied the changes from the source database?

Upvotes: 4

Views: 850

Answers (2)

Daria
Daria

Reputation: 15

The answer above is correct, except you need to use restart_lsn instead of confirmed_flush_lsn. restart_lsn is the last wal address that subscriber might still need.

From documentation restart_lsn is:

The address (LSN) of oldest WAL which still might be required by the consumer of this slot and thus won't be automatically removed during checkpoints unless this LSN gets behind more than max_slot_wal_keep_size from the current LSN. NULL if the LSN of this slot has never been reserved.

So the query will be:

SELECT
    pg_wal_lsn_diff (pg_current_wal_lsn (), restart_lsn)
FROM
    pg_replication_slots;

Upvotes: 1

manjunath
manjunath

Reputation: 306

You can find logical-replication lag from source database using the below query:

SELECT pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn) 
FROM pg_replication_slots;
  • pg_current_wal_lsn() : This function returns the current wal lsn position in source database.
  • confirmed_flush_lsn : this field is present in pg_replication_slots view. This value tells that till which position in WAL (source) the subscriber/logical-client has received the changes sent from source database. This value is sent by subscriber/logical-client via feedback message to source database.

You can use above query in monitoring script.

Upvotes: 2

Related Questions