Alf162
Alf162

Reputation: 493

How to check the replication delay in PostgreSQL?

I would like to measure time between insert data into master-table and slave-table using streaming replication in PostgreSQL 9.3. For this I create table test_time with 2 fields id(serial), t(text). After that added a trigger:

cur_time:=to_char(current_timestamp, 'HH12:MI:SS:MS:US'); update test_time set t=cur_time where id=new.id;

But the time is the same in both tables. How can I measure delay time

Upvotes: 49

Views: 116907

Answers (8)

Ramanan
Ramanan

Reputation: 591

If your database has frequent writes, then the below query is a close approximation to get the slave lag

select now() - pg_last_xact_replay_timestamp() AS replication_delay;

Below is a more accurate query for calculating replication lag for databases with very few writes. If the master doesnt sent down any write to the slave, then pg_last_xact_replay_timestamp() can be constant and hence may not accurately determine the slave lag using the above query.

SELECT 
    CASE 
        WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn() THEN 0 
        ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp()) 
    END AS log_delay;

Upvotes: 17

iuridiniz
iuridiniz

Reputation: 2423

PostgreSQL 10 or later (answer)

For postgresql 10 or later (function pg_last_xlog_receive_location() and others does not exist in this version), I use this:

SELECT
  pg_is_in_recovery() AS is_slave,
  pg_last_wal_receive_lsn() AS receive,
  pg_last_wal_replay_lsn() AS replay,
  pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn() AS synced,
  (
   EXTRACT(EPOCH FROM now()) -
   EXTRACT(EPOCH FROM pg_last_xact_replay_timestamp())
  )::int AS lag;

if you run this query on master, the results will be:

 is_slave | receive | replay | synced | lag 
----------+---------+--------+--------+-----
 f        |         |        |        |    
(1 row)

if you run this query on synced slave, the results will be like:

 is_slave |  receive  |  replay   | synced | lag 
----------+-----------+-----------+--------+-----
 t        | 0/3003128 | 0/3003128 | t      | 214
(1 row)

if you run this query on NOT synced slave, the results will be like:

 is_slave |  receive  |  replay   | synced | lag 
----------+-----------+-----------+--------+-----
 t        | 0/30030F0 | 0/30023B0 | f      | 129
(1 row)

NOTE: lag (seconds) has a special meaning here (it is not the same that replay_lag/write_lag/flush_lag from pg_stat_replication view) and it is only useful when synced column is false, because lag means how many seconds elapsed since the last action was commited. In a low traffic site, this value is useless. But in an high traffic site, synced could (and will) be almost of time false, however if it has lag value small enough that server could be considered synced.

So, in order to discovery if that server is synced, I check (in this order):

  • IF is_slave is f (meaning that is a not a slave, maybe a master, so it is synced);
  • IF synced is t (meaning that is a synced slave, so it is synced);
  • IF (supposing applicable) lag <= :threshold: (meaning that is not a synced slave, but it is not too far from master, so it's synced enough for me).

If you want lag in seconds including decimals, do:

SELECT
  pg_is_in_recovery() AS is_slave,
  pg_last_wal_receive_lsn() AS receive,
  pg_last_wal_replay_lsn() AS replay,
  pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn() AS synced,
  EXTRACT(SECONDS FROM now() - pg_last_xact_replay_timestamp())::float AS lag;

Upvotes: 20

Aman Tiwari
Aman Tiwari

Reputation: 1435

You can use this simple CLI based opensource tool that can provide realtime visualisation about replication lag using various modes e.g. CLI, web mode as well as matplotlib based charts for easy tracking.

Replication-Lag-Visualizer

Feel free to raise any issues or contribute to it.

Web Mode

Upvotes: 1

dbenhur
dbenhur

Reputation: 20408

Alf162 mentioned a good solution in the comments to Craig Ringer's answer; so I'm adding this to clarify.

PostgreSQL has an administrative function pg_last_xact_replay_timestamp() which returns time stamp of the last transaction replayed during recovery. This is the time at which the commit or abort WAL record for that transaction was generated on the primary.

So this query select now()-pg_last_xact_replay_timestamp() as replication_lag on a replica will return a duration representing the difference in time between the current clock and the timestamp of the last WAL record applied from the replication stream.

Note that if the master is not receiving new mutations, there will be no WAL records to stream and the lag calculated this way will grow without actually being a signal of delays in replication. If the master is under more or less continuous mutation, it will be continuously streaming WALs and the above query is a fine approximation of the time delay for changes on the master to materialize on the slave. Accuracy will obviously be affected by how rigorously synchronized the system clocks on the two hosts are.

Upvotes: 46

shurikk
shurikk

Reputation: 544

slightly different version of the correct answer:

postgres=# SELECT
  pg_last_xlog_receive_location() receive,
  pg_last_xlog_replay_location() replay,
  (
   extract(epoch FROM now()) -
   extract(epoch FROM pg_last_xact_replay_timestamp())
  )::int lag;

  receive   |   replay   |  lag  
------------+------------+-------
 1/AB861728 | 1/AB861728 | 2027

the lag is only important when "receive" not equal "replay". execute the query on the replica

Upvotes: 8

linehrr
linehrr

Reputation: 1748

on master, you can do select * from pg_stat_replication;
this will give you:

|  sent_lsn   |  write_lsn  |  flush_lsn  | replay_lsn  

-+-------------+-------------+-------------+-------------

 | 8D/2DA48000 | 8D/2DA48000 | 8D/2DA48000 | 89/56A0D500 

those can tell you where your offsets are. as you can see from this example, replay on the replica is behind.

Upvotes: 0

Vao Tsun
Vao Tsun

Reputation: 51456

as of 10 release:

https://www.postgresql.org/docs/10/static/monitoring-stats.html#pg-stat-replication-view

write_lag interval Time elapsed between flushing recent WAL locally and receiving notification that this standby server has written it (but not yet flushed it or applied it). This can be used to gauge the delay that synchronous_commit level remote_write incurred while committing if this server was configured as a synchronous standby.

flush_lag interval Time elapsed between flushing recent WAL locally and receiving notification that this standby server has written and flushed it (but not yet applied it). This can be used to gauge the delay that synchronous_commit level remote_flush incurred while committing if this server was configured as a synchronous standby.

replay_lag interval Time elapsed between flushing recent WAL locally and receiving notification that this standby server has written, flushed and applied it. This can be used to gauge the delay that synchronous_commit level remote_apply incurred while committing if this server was configured as a synchronous standby.

(formatting mine)

Alas new columns seem to suit only synchronous replication (otherwise master would not know exact lag) thus async replication delay chack seem to remain now()-pg_last_xact_replay_timestamp()...

Upvotes: 8

Craig Ringer
Craig Ringer

Reputation: 324445

You can get the delay in bytes from the master side quite easily using pg_xlog_location_diff to compare the master's pg_current_xlog_insert_location with the replay_location for that backend's pg_stat_replication entry.

This only works when run on the master. You can't do it from the replica because the replica has no idea how far ahead the master is.

Additionally this won't tell you the lag in seconds. In current (as of 9.4 at least) PostgreSQL versions there's no timestamp associated with a commit or a WAL record. So there's no way to tell how long ago a given LSN (xlog position) was.

The only way to get the replica lag in seconds on a current PostgreSQL version is to have an external process commit an update to a dedicated timestamp table periodically. So you can compare current_timestamp on the replica to the timestamp of the most recent entry in that table visible on the replica to see how far the replica is behind. This creates additional WAL traffic that will then have to be kept in your archived WAL for PITR (PgBarman or whatever), so you should balance the increased data use with the granularity of lag detection you require.

PostgreSQL 9.5 may add commit timestamps that will hopefully let you find out how long ago a given commit happened and therefore how far a replica is behind in wall-clock seconds.

Upvotes: 29

Related Questions