Reputation: 553
I have a Ruby on Rails app that is using a Postgresql database. I've noticed that my database performance has huge spikes every 5-7 minutes.
I'm seeing 1+ second response times for simple queries like:
UPDATE users SET last_seen_at = ? where id = ?
or
INSERT INTO emails (email, created_at, updated_at) VALUES (?, ?, ?)
The VPS is an AWS EC2 instance (m2.2xlarge) with a 4 core Xeon 2.4ghz and 34gb of memory.
I made to following changes to the conf to try to figure it out (like reducing the # of checkpoint timeouts) to no avail.
root:/etc/postgresql/9.2/main# diff postgresql.conf.bck postgresql.conf
176,178c176,178
< #checkpoint_segments = 3 # in logfile segments, min 1, 16MB each
< #checkpoint_timeout = 5min # range 30s-1h
< #checkpoint_completion_target = 0.5 # checkpoint target duration, 0.0 - 1.0
---
> checkpoint_segments = 10 # in logfile segments, min 1, 16MB each
> checkpoint_timeout = 30min # range 30s-1h
> checkpoint_completion_target = 0.9 # checkpoint target duration, 0.0 - 1.0
361c361
< #log_min_duration_statement = -1 # -1 is disabled, 0 logs all statements
---
> log_min_duration_statement = 2s # -1 is disabled, 0 logs all statements
370,371c370,371
< #debug_print_rewritten = off
< #debug_print_plan = off
---
> #debug_print_rewritten = on
> #debug_print_plan = on
376c376
< #log_duration = off
---
> #log_duration = on
378c378
< #log_hostname = off
---
> #log_hostname = on
399c399
< #log_lock_waits = off # log lock waits >= deadlock_timeout
---
> log_lock_waits = on # log lock waits >= deadlock_timeout
Upvotes: 1
Views: 2404
Reputation: 44137
You have a serious IO problem at the end of the checkpoints. Note that the slow queries are mostly COMMIT, which should do almost nothing but flush the WAL log, and that it took 41.604 s to sync the files (including 11 s to sync one file!).
There is probably nothing much you can do from within PostgreSQL to improve this. I've heard rumors that lowering shared_buffers might help, but I have not seen that first hand.
You probably need to make changes on the operating system, like lowering /proc/sys/vm/dirty_ratio
so that it doesn't allow so much dirty data to build up between checkpoints. Also, if you can separate your WAL logs to separate disks from the main data, that can help.
What filesystem are you using? What kernel/distro?
There is also the possibility that your workload simply can't be accommodated by the IO system you are using, and you need to move to more capable hardware.
Upvotes: 1