Reputation: 1355
I have a table with about 30 million rows in a Postgres 9.4 db. This table has 6 columns, the primary key id, 2 text, one boolean, and two timestamp. There are indices on one of the text columns, and obviously the primary key.
I want to copy the values in the first timestamp column, call it timestamp_a
into the second timestamp column, call it timestamp_b
. To do this, I ran the following query:
UPDATE my_table SET timestamp_b = timestamp_a;
This worked, but it took an hour and 15 minutes to complete, which seems a really long time to me considering, as far as I know, it's just copying values from one column to the next.
I ran EXPLAIN
on the query and nothing seemed particularly inefficient. I then used pgtune to modify my config file, most notably it increased the shared_buffers
, work_mem
, and maintenance_work_mem
.
I re-ran the query and it took essentially the same amount of time, actually slightly longer (an hour and 20 mins).
What else can I do to improve the speed of this update? Is this just how long it takes to write 30 million timestamps into postgres? For context I'm running this on a macbook pro, osx, quadcore, 16 gigs of ram.
Upvotes: 2
Views: 539
Reputation: 656471
@Craig mentioned an optimization for COPY
after TRUNCATE
: Postgres can skip WAL entries because if the transaction fails, nobody will ever have seen the new table anyway.
The same optimization is true for tables created with CREATE TABLE AS
:
Details are missing in your description, but if you can afford to write a new table (no concurrent transactions get in the way, no dependencies), then the fastest way might be (except if you have big TOAST table entries - basically big columns):
BEGIN;
LOCK TABLE my_table IN SHARE MODE; -- only for concurrent access
SET LOCAL work_mem = '???? MB'; -- just for this transaction
CREATE my_table2
SELECT ..., timestamp_a, timestamp_a AS timestamp_b
-- columns in order, timestamp_a overwrites timestamp_b
FROM my_table
ORDER BY ??; -- optionally cluster table while being at it.
DROP TABLE my_table;
ALTER TABLE my_table2 RENAME TO my_table;
ALTER TABLE my_table
, ADD CONSTRAINT my_table_id_pk PRIMARY KEY (id);
-- more constraints, indices, triggers?
-- recreate views etc. if any
COMMIT;
The additional benefit: a pristine (optionally clustered) table without bloat. Related:
Upvotes: 1
Reputation: 324375
The reason this is slow is that internally PostgreSQL doesn't update the field. It actually writes new rows with the new values. This usually takes a similar time to insert
ing that many values.
If there are indexes on any column this can further slow the update down. Even if they're not on columns being updated, because PostgreSQL has to write a new row and write new index entries to point to that row. HOT updates can help and will do so automatically if available, but that generally only helps if the table is subject to lots of small updates. It's also disabled if any of the fields being updated are indexed.
Since you're basically rewriting the table, if you don't mind locking out all concurrent users while you do it you can do it faster with:
BEGIN
DROP
all indexesUPDATE
the tableCREATE
all indexes againCOMMIT
PostgreSQL also has an optimisation for writes to tables that've just been TRUNCATE
d, but to benefit from that you'd have to copy the data to a temp table, then TRUNCATE
and copy it back. So there's no benefit.
Upvotes: 2