Reputation: 1468
We have a powerful Postgres server (64 cores, 384 GB RAM, 16 15k SAS drives, RAID 10), and several times during the day we rebuild several large datasets, which is very write intensive. Apache and Tomcat also run on the same server.
We're getting this warning about 300 times a day, while rebuilding these datasets, with long stretches where the errors are averaging 2 - 5 seconds apart:
2015-01-15 12:32:53 EST [11403]: [10841-1] LOG: checkpoints are occurring too frequently (2 seconds apart)
2015-01-15 12:32:56 EST [11403]: [10845-1] LOG: checkpoints are occurring too frequently (3 seconds apart)
2015-01-15 12:32:58 EST [11403]: [10849-1] LOG: checkpoints are occurring too frequently (2 seconds apart)
2015-01-15 12:33:01 EST [11403]: [10853-1] LOG: checkpoints are occurring too frequently (3 seconds apart)
These are the related settings:
checkpoint_completion_target 0.7
checkpoint_segments 64
checkpoint_timeout 5min
checkpoint_warning 30s
wal_block_size 8192
wal_buffers 4MB
wal_keep_segments 5000
wal_level hot_standby
wal_receiver_status_interval 10s
wal_segment_size 16MB
wal_sync_method fdatasync
wal_writer_delay 200ms
work_mem 96MB
shared_buffers 24GB
effective_cache_size 128GB
So that means we're writing 1024 MB worth of WAL files every 2 - 5 seconds, sometimes sustained for 15 - 30 minutes.
Do you see any settings we can improve on? Let me know if you need other settings documented.
Could we use "SET LOCAL synchronous_commit TO OFF;" at the beginning of these write-intensive transactions to let these WAL writes happen a bit more in the background, having less impact on the rest of the operations?
The data we're rebuilding is stored elsewhere, so on the off chance the power failed AND the RAID battery backup didn't do it's job, we're not out anything once the dataset gets rebuilt again.
Would "SET LOCAL synchronous_commit TO OFF;" cause any problems if this continues for 15 - 30 minutes? Or cause any problems with our streaming replication, which uses WAL senders?
Thanks!
PS. I'm hoping Samsung starts shipping their SM1715 3.2 TB PCIe enterprise SSD, since I think it would solve our problems nicely.
Upvotes: 18
Views: 45292
Reputation: 4371
Your server is generating so much WAL data due to the wal_level
set to hot_standby
. I'm assuming you need this, so the best option to avoid the warnings is to increase your checkpoint_segments
. But they are just that - warnings - it's quite common and perfectly normal to see them during bulk updates and data loads. You just happen to be updating frequently.
Changing synchronous_commit
does not change what is written to the WAL, but rather the timing of when the commit returns to allow the OS to buffer those writes.
It may not apply to your schema, but you could potentially save some WAL data by using unlogged tables for your data rebuilds. Your replicas wouldn't have access to those tables, but after the rebuild you would be able to update your logged tables from their unlogged siblings.
Upvotes: 25