oᴉɹǝɥɔ
oᴉɹǝɥɔ

Reputation: 2055

Postgresql large table update slows down

I run an update on a large table (e.g. 8 GB). It is a simple update of 3 fields in the table. I had no problems running it under postgresql 9.1, it would take 40-60 minutes but it worked. I run the same query in 9.4 database (freshly created, not upgraded) and it starts the update fine but then slows down. It uses only ~2% CPU, the level if IO is 4-5MB/s and it is sitting there. No locks, no other queries or connections, just this single update SQL on the server.

The SQL is below. "lookup" table has 12 records. The lookup can return only one row, it breaks a discrete scale (SMALLINT, -32768 .. +32767) into non-overlapping regions. "src" and "dest" tables are ~60 million records.

UPDATE dest SET
    field1 = src.field1,
    field2 = src.field2,
    field3_id = (SELECT lookup.id FROM lookup WHERE src.value BETWEEN lookup.min AND lookup.max)
FROM src
WHERE dest.id = src.id;

I thought my disk slowed down but I can copy 1 GB files in parallel to query execution and it runs fast at >40MB/s and I have only one disk (it is a VM with ISCSI media). All other disk operations are not impacted, there is plenty of IO bandwidth. At the same time PostgreSQL is just sitting there doing very little, running very slowly.

I have 2 virtualized linux servers, one runs postgresql 9.1 and another runs 9.4. Both servers have close to identical postgresql configuration.

Has anyone else had similar experience? I am running out of ideas. Help.

Edit The query "ran" for 20 hours I had to kill the connections and restart the server. Surprisingly it didn't kill the connection via query:

SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE pid <> pg_backend_pid() AND datname = current_database();

and sever produced the following log:

2015-05-21 12:41:53.412 EDT FATAL:  terminating connection due to administrator command
2015-05-21 12:41:53.438 EDT FATAL:  terminating connection due to administrator command
2015-05-21 12:41:53.438 EDT STATEMENT:  UPDATE <... this is 60,000,000 record table update statement>

Also server restart took long time, producing the following log:

2015-05-21 12:43:36.730 EDT LOG:  received fast shutdown request
2015-05-21 12:43:36.730 EDT LOG:  aborting any active transactions
2015-05-21 12:43:36.730 EDT FATAL:  terminating connection due to administrator command
2015-05-21 12:43:36.734 EDT FATAL:  terminating connection due to administrator command
2015-05-21 12:43:36.747 EDT LOG:  autovacuum launcher shutting down
2015-05-21 12:44:36.801 EDT LOG:  received immediate shutdown request
2015-05-21 12:44:36.815 EDT WARNING:  terminating connection because of crash of another server process
2015-05-21 12:44:36.815 EDT DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

"The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory" - is this an indication of a bug in PostgreSQL?

Edit I tested 9.1, 9.3 and 9.4. Both 9.1 and 9.3 don't experience the slow down. 9.4 consistently slows down on large transactions. I noticed that when a transaction starts htop monitor indicates high CPU and the process status is "R" (running). Then it gradually changes to low CPU usage and status "D" - disk (see screenshot Disk waiting). My biggest question is why 9.4 is different from 9.1 and 9.3? I have a dozen of servers and this effect is observed across the board.

Upvotes: 3

Views: 2080

Answers (3)

oᴉɹǝɥɔ
oᴉɹǝɥɔ

Reputation: 2055

Thanks everyone for the help. No matter how much I tried to emphasize on the difference of performance between identical configuration of 9.4 and previous versions no one seemed to pay attention to that.

The problem was solved by disabling transparent huge pages:

echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag

Here are some resources I found helpful in reserching the issue:
* https://dba.stackexchange.com/questions/32890/postgresql-pg-stat-activity-shows-commit/34169#34169
* https://lwn.net/Articles/591723/
* https://blogs.oracle.com/linux/entry/performance_issues_with_transparent_huge

Upvotes: 4

joop
joop

Reputation: 4503

If only few of the target rows are actually updated, you can avoid new row versions to be generated by using DISTICNT FROM. This can prevent a lot of useless disk traffic.

UPDATE dest SET
    field1 = src.field1,
    field2 = src.field2,
    field3_id = lu.id
FROM src
JOIN lookup lu ON src.value BETWEEN lu.min AND lu.max
WHERE dest.id = src.id
        -- avoid unnecessary row versions to be generated
AND     (dest.field1 IS DISTINCT FROM src.field1
        OR dest.field1 IS DISTINCT FROM src.field1
        OR dest.field3_id IS DISTINCT FROM lu.id
        )
        ;

Upvotes: 0

Tometzky
Tometzky

Reputation: 23890

I'd suspect a lot of disk seeking - 5MB/s is just about right for a very random IO on ordinary (spinning) hard drive.

As you constantly replace basically all your rows I'd try to set dest table fillfactor to about 45% (alter table dest set (fillfactor=45);) and then cluster test using test_pkey;. This would allow updated row versions to be placed in the same disk sector.

Additionally using cluster src using src_pkey; so both tables would have data in the same physical order on disk also can help.

Also remember to vacuum table dest; after every update that large, so old row versions could be used again in subsequent updates.

Your old server probably evolved it's fillfactor naturally during multiple updates. On new server it is packed 100%, so updated rows have to be placed at the end.

Upvotes: 0

Related Questions