Umut TEKİN
Umut TEKİN

Reputation: 962

PostgreSQL Checkpoint Discrepancy

There is problem about that I don' t understand in a database. Our configuration is like following:

archive_mode = on
archive_timeout = 900
checkpoint_timeout = 60min
checkpoint_completion_target = 0.9
max_wal_size = 4GB

We do not hit max_wal_size limit. Our average is 60 + 0.9 = 54 minutes which makes sense.

postgres=# SELECT
total_checkpoints,
seconds_since_start / total_checkpoints / 60 AS minutes_between_checkpoints
FROM
(SELECT
EXTRACT(EPOCH FROM (now() - pg_postmaster_start_time())) AS seconds_since_start,
(checkpoints_timed+checkpoints_req) AS total_checkpoints
FROM pg_stat_bgwriter
) AS sub;
-[ RECORD 1 ]---------------+------------
total_checkpoints           | 240
minutes_between_checkpoints | 54.63359986

Yet, yesterday I checked the latest checkpoit at 13.19:

postgres=# SELECT * FROM pg_control_checkpoint();
-[ RECORD 1 ]--------+-------------------------
checkpoint_lsn       | 862/D67582F0
prior_lsn            | 862/7EBA9A80
redo_lsn             | 862/87008050
redo_wal_file        | 000000030000086200000087
timeline_id          | 3
prev_timeline_id     | 3
full_page_writes     | t
next_xid             | 0:1484144344
next_oid             | 8611735
next_multixact_id    | 151786
next_multi_offset    | 305073
oldest_xid           | 1284151498
oldest_xid_dbid      | 1905285
oldest_active_xid    | 1484144342
oldest_multi_xid     | 1
oldest_multi_dbid    | 1905305
oldest_commit_ts_xid | 0
newest_commit_ts_xid | 0
checkpoint_time      | 2022-09-21 12:19:17+02

So, after latest checkpoint it passed more than 60 minutes, it should have taken another checkpoint. Archive mode is enabled and 15 minutes but it does not take checkpoint. Only possbile explanation is not generating any WAL according to the official document, but we generated lots of WAL, this is very active database(not as much as to fullfil 4 GB WAL). What do I miss?

Thanks!

Upvotes: 0

Views: 1748

Answers (2)

Take Ichiru
Take Ichiru

Reputation: 76

That seems to be fine as checkpoint, the timing there is not correct, but you can enable log_checkpoints=on. Based on your yesterday checkpoint, the WAL difference is about 1.3 - 1.4 GiB hour of write or about 25 MiB/min write, not too much.

For about WAL rotate, your archive_timeout has did that job with 15 minutes to force rotate the 1 WAL file.

For the 4 GiB of max_wal_size, it is just the size-based checkpoint soft limit, so with less than 1.5 GiB of WAL, no size-based checkpoint, just the time-based checkpoint.

But still the 1h checkpoint_timeout seems a bit large but it could depend on the number of data write so if it is OK for you then you can go with that setting.

Have a nice days.

Upvotes: 0

Laurenz Albe
Laurenz Albe

Reputation: 246358

That seems perfectly fine.

With your settings, PostgreSQL will run a checkpoint every hour and time it to take around 54 minutes. So 90% of the time you have some checkpoint activity, and 10% nothing. Of course this timing is not 100% accurate, so don't worry about a minute up or down.

If you want to observe this behavior in more detail, set log_checkpoints = on. Then you will get a log message whenever a checkpoint starts and whenever it completes. Leave this setting on, this is useful information for debugging database problems.

Upvotes: 1

Related Questions