Reputation: 1066
I've got a large table (25M rows, 30GB) that needs to have a large amount of foreign keys set to a new table that I've just added. I've prepared a temporary table that holds the primary key ID of the main table to the appropriate ID on the secondary table, so that my tables are as such:
Main_Table: (id: uuid, ..., secondaryTableId: uuid)
Secondary_Table: (id: uuid, ...)
FK_Update_Table: (mainTableId: uuid, secondaryTableId: uuid)
There are indexes on each column detailed above, and the FK_UPDATE_TABLE
is filled in with the appropriate pairings via a COPY FROM CSV
. The issue is that a simple UPDATE query such as below takes too long (~5 hours):
UPDATE Main_Table
SET "secondaryTableId" = FK_Update_Table."secondaryTableId"
FROM FK_Update_Table
WHERE Main_Table."id" = FK_Update_Table."mainTableId"
I've tried paginating this into much smaller updates as well as parallelising the update on seperate connections and these don't yield much performance improvement. Running EXPLAIN (analyze, buffers, timings)
to update a single row shows that most of the time of the update (5 out of 7 seconds) comes from the FK scan.
I'm wondering if there's any way to iterate through the Main_Table
in any order to avoid the primary key index scan on that table. The FK_Update_Table
is much smaller (1.5 GB) so I'm thinking I can pg_prewarm
to load that entire table into memory so the index search on the FK_Update_Table
will be faster. I tried using a cursor with WHERE CURRENT OF ... UPDATE
to see if that would update the row in place but it hasn't provided any increase in performance. Any tips or advice on how to speed this up would be very appreciated.
Upvotes: 0
Views: 61