Conrad Scherb
Conrad Scherb

Reputation: 1066

Mass updating Postgres rows in-place to avoid primary key index scans

I've got a large table (25M rows, 30GB) that needs to have a large amount of foreign keys set to a new table that I've just added. I've prepared a temporary table that holds the primary key ID of the main table to the appropriate ID on the secondary table, so that my tables are as such:

Main_Table: (id: uuid, ..., secondaryTableId: uuid)
Secondary_Table: (id: uuid, ...)
FK_Update_Table: (mainTableId: uuid, secondaryTableId: uuid)

There are indexes on each column detailed above, and the FK_UPDATE_TABLE is filled in with the appropriate pairings via a COPY FROM CSV. The issue is that a simple UPDATE query such as below takes too long (~5 hours):

UPDATE Main_Table 
SET "secondaryTableId" = FK_Update_Table."secondaryTableId"
FROM FK_Update_Table
WHERE Main_Table."id" = FK_Update_Table."mainTableId"

I've tried paginating this into much smaller updates as well as parallelising the update on seperate connections and these don't yield much performance improvement. Running EXPLAIN (analyze, buffers, timings) to update a single row shows that most of the time of the update (5 out of 7 seconds) comes from the FK scan. enter image description here

I'm wondering if there's any way to iterate through the Main_Table in any order to avoid the primary key index scan on that table. The FK_Update_Table is much smaller (1.5 GB) so I'm thinking I can pg_prewarm to load that entire table into memory so the index search on the FK_Update_Table will be faster. I tried using a cursor with WHERE CURRENT OF ... UPDATE to see if that would update the row in place but it hasn't provided any increase in performance. Any tips or advice on how to speed this up would be very appreciated.

Upvotes: 0

Views: 61

Answers (0)

Related Questions