Postgres: Checkpoints Are Occurring Too Frequently

Question

We have a powerful Postgres server (64 cores, 384 GB RAM, 16 15k SAS drives, RAID 10), and several times during the day we rebuild several large datasets, which is very write intensive. Apache and Tomcat also run on the same server.

We're getting this warning about 300 times a day, while rebuilding these datasets, with long stretches where the errors are averaging 2 - 5 seconds apart:

2015-01-15 12:32:53 EST [11403]: [10841-1] LOG:  checkpoints are occurring too frequently (2 seconds apart)
2015-01-15 12:32:56 EST [11403]: [10845-1] LOG:  checkpoints are occurring too frequently (3 seconds apart)
2015-01-15 12:32:58 EST [11403]: [10849-1] LOG:  checkpoints are occurring too frequently (2 seconds apart)
2015-01-15 12:33:01 EST [11403]: [10853-1] LOG:  checkpoints are occurring too frequently (3 seconds apart)

These are the related settings:

checkpoint_completion_target    0.7
checkpoint_segments             64
checkpoint_timeout              5min
checkpoint_warning              30s
wal_block_size                  8192
wal_buffers                     4MB
wal_keep_segments               5000
wal_level                       hot_standby
wal_receiver_status_interval    10s
wal_segment_size                16MB
wal_sync_method                 fdatasync
wal_writer_delay                200ms
work_mem                        96MB
shared_buffers                  24GB
effective_cache_size            128GB

So that means we're writing 1024 MB worth of WAL files every 2 - 5 seconds, sometimes sustained for 15 - 30 minutes.

Do you see any settings we can improve on? Let me know if you need other settings documented.
Could we use "SET LOCAL synchronous_commit TO OFF;" at the beginning of these write-intensive transactions to let these WAL writes happen a bit more in the background, having less impact on the rest of the operations?

The data we're rebuilding is stored elsewhere, so on the off chance the power failed AND the RAID battery backup didn't do it's job, we're not out anything once the dataset gets rebuilt again.

Would "SET LOCAL synchronous_commit TO OFF;" cause any problems if this continues for 15 - 30 minutes? Or cause any problems with our streaming replication, which uses WAL senders?

Thanks!

PS. I'm hoping Samsung starts shipping their SM1715 3.2 TB PCIe enterprise SSD, since I think it would solve our problems nicely.

Postgres: Checkpoints Are Occurring Too Frequently

Answers (1)

Related Questions