steve
steve

Reputation: 1355

Copying timestamp columns within a Postgres table

I have a table with about 30 million rows in a Postgres 9.4 db. This table has 6 columns, the primary key id, 2 text, one boolean, and two timestamp. There are indices on one of the text columns, and obviously the primary key.

I want to copy the values in the first timestamp column, call it timestamp_a into the second timestamp column, call it timestamp_b. To do this, I ran the following query:

UPDATE my_table SET timestamp_b = timestamp_a;

This worked, but it took an hour and 15 minutes to complete, which seems a really long time to me considering, as far as I know, it's just copying values from one column to the next.

I ran EXPLAIN on the query and nothing seemed particularly inefficient. I then used pgtune to modify my config file, most notably it increased the shared_buffers, work_mem, and maintenance_work_mem.

I re-ran the query and it took essentially the same amount of time, actually slightly longer (an hour and 20 mins).

What else can I do to improve the speed of this update? Is this just how long it takes to write 30 million timestamps into postgres? For context I'm running this on a macbook pro, osx, quadcore, 16 gigs of ram.

Upvotes: 2

Views: 539

Answers (2)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 656471

@Craig mentioned an optimization for COPY after TRUNCATE: Postgres can skip WAL entries because if the transaction fails, nobody will ever have seen the new table anyway.

The same optimization is true for tables created with CREATE TABLE AS:

Details are missing in your description, but if you can afford to write a new table (no concurrent transactions get in the way, no dependencies), then the fastest way might be (except if you have big TOAST table entries - basically big columns):

BEGIN;
LOCK TABLE my_table IN SHARE MODE;  -- only for concurrent access

SET LOCAL work_mem = '???? MB';  -- just for this transaction

CREATE my_table2
SELECT ..., timestamp_a, timestamp_a AS timestamp_b
       -- columns in order, timestamp_a overwrites timestamp_b
FROM   my_table
ORDER  BY ??;  -- optionally cluster table while being at it.

DROP TABLE my_table;
ALTER TABLE my_table2 RENAME TO my_table;

ALTER TABLE my_table
 , ADD CONSTRAINT my_table_id_pk PRIMARY KEY (id);

-- more constraints, indices, triggers?
-- recreate views etc. if any

COMMIT;

The additional benefit: a pristine (optionally clustered) table without bloat. Related:

Upvotes: 1

Craig Ringer
Craig Ringer

Reputation: 324375

The reason this is slow is that internally PostgreSQL doesn't update the field. It actually writes new rows with the new values. This usually takes a similar time to inserting that many values.

If there are indexes on any column this can further slow the update down. Even if they're not on columns being updated, because PostgreSQL has to write a new row and write new index entries to point to that row. HOT updates can help and will do so automatically if available, but that generally only helps if the table is subject to lots of small updates. It's also disabled if any of the fields being updated are indexed.

Since you're basically rewriting the table, if you don't mind locking out all concurrent users while you do it you can do it faster with:

  • BEGIN
  • DROP all indexes
  • UPDATE the table
  • CREATE all indexes again
  • COMMIT

PostgreSQL also has an optimisation for writes to tables that've just been TRUNCATEd, but to benefit from that you'd have to copy the data to a temp table, then TRUNCATE and copy it back. So there's no benefit.

Upvotes: 2

Related Questions