Reputation: 6740

Very slow Postgres UPDATE on large table

I have a Postgres 9.1.3 table with 2.06 million rows after WHERE Y=1 as per below (it only has a few ten thousand more rows total without any WHERE). I am trying to add data to an empty field with a query like this:

WITH B AS (
    SELECT Z,
           rank() OVER (ORDER BY L, N, M, P) AS X
    FROM   A
    WHERE  Y=1
)

UPDATE A
SET A.X = B.X
FROM B
WHERE A.Y=1
  AND B.Z = A.Z;

This query runs for hours and appears to progress very slowly. In fact, the second time I tried this, I had a power outage after the query ran for ~3 hours. After restoring power, I analyzed the table and got this:

INFO:  analyzing "consistent.master"
INFO:  "master": scanned 30000 of 69354 pages, containing 903542 live rows and 153552 dead rows; 30000 rows in sample, 2294502 estimated total rows
Total query runtime: 60089 ms.

Is it correct to interpret that the query had barely progressed in those hours?

I have done a VACUUM FULL and ANALYZE before running the long query.

The query within the WITH only takes 40 seconds.

All fields referenced above except A.X, and by extension B.X, are indexed: L, M, N, P, Y, Z.

This is being run on a laptop with 8 GB RAM, a Core i7 Q720 1.6 GHz quad core processor, and Windows 7 x64. I am running Postgres 32 bit for compatibility with PostGIS 1.5.3. 64 bit PostGIS for Windows isn't available yet. (32 bit Postgres means it can't use more than 2 GB RAM in Windows, but I doubt that's an issue here.)

Here's the result of EXPLAIN:

Update on A  (cost=727684.76..945437.01 rows=2032987 width=330)
  CTE B
    ->  WindowAgg  (cost=491007.50..542482.47 rows=2058999 width=43)
          ->  Sort  (cost=491007.50..496155.00 rows=2058999 width=43)
                Sort Key: A.L, A.N, A.M, A.P
                ->  Seq Scan on A  (cost=0.00..85066.80 rows=2058999 width=43)
                      Filter: (Y = 1)
  ->  Hash Join  (cost=185202.29..402954.54 rows=2032987 width=330)
        Hash Cond: ((B.Z)::text = (A.Z)::text)
        ->  CTE Scan on B  (cost=0.00..41179.98 rows=2058999 width=88)
        ->  Hash  (cost=85066.80..85066.80 rows=2058999 width=266)
              ->  Seq Scan on A  (cost=0.00..85066.80 rows=2058999 width=266)
                    Filter: (Y = 1)

Upvotes: 2

Answers (3)

poshest

Reputation: 4237

WITH B AS (
  
  -- list all the columns in your table in order including the one to be "updated"
  SELECT L, N, M, P,
         rank() OVER (ORDER BY L, N, M, P) AS X,
         Y, Z
  FROM   A
  WHERE  Y=1

), D AS (

   DELETE FROM A WHERE Y=1

)
INSERT INTO A
SELECT * FROM B

I have used this a couple of times now to fix a never-ending update. The above pattern was completing for me on millions of rows in minutes.

One of those times I had primary key conflicts. That was solved by DROPing the primary key before running the above statement and re-CREATEing it afterwards. I am surprised that's necessary since in my CTE, the conflicting values are deleted before the insert happens, but oh well.

Upvotes: 1

dpetruha

Reputation: 1294

Try to rewrite the query like this:

UPDATE A
SET A.X = B.X
FROM B
WHERE A.Y=1
      AND B.Z = A.Z
      AND A.X IS DISTINCT FROM B.X;

Upvotes: 0

maniek

Reputation: 7307

There could be multiple solutions.

The update could be blocked on a lock. Consult pg_locks view.
Maybe there are triggers on A? They could be the reason for slowdown.
Try "explain update... " - is the plan significantly different than the plan of plain select? Maybe You could do it in 2 steps - export "B" to a table, and update from that table.
Try dropping the indexes before the update.
Create a new table, drop the old one, rename the new table to old table's name.

Upvotes: 5

Very slow Postgres UPDATE on large table

Answers (3)

Related Questions