Reputation: 6740
I have a Postgres 9.1.3 table with 2.06 million rows after WHERE Y=1 as per below (it only has a few ten thousand more rows total without any WHERE). I am trying to add data to an empty field with a query like this:
WITH B AS (
SELECT Z,
rank() OVER (ORDER BY L, N, M, P) AS X
FROM A
WHERE Y=1
)
UPDATE A
SET A.X = B.X
FROM B
WHERE A.Y=1
AND B.Z = A.Z;
This query runs for hours and appears to progress very slowly. In fact, the second time I tried this, I had a power outage after the query ran for ~3 hours. After restoring power, I analyzed the table and got this:
INFO: analyzing "consistent.master"
INFO: "master": scanned 30000 of 69354 pages, containing 903542 live rows and 153552 dead rows; 30000 rows in sample, 2294502 estimated total rows
Total query runtime: 60089 ms.
Is it correct to interpret that the query had barely progressed in those hours?
I have done a VACUUM FULL and ANALYZE before running the long query.
The query within the WITH only takes 40 seconds.
All fields referenced above except A.X, and by extension B.X, are indexed: L, M, N, P, Y, Z.
This is being run on a laptop with 8 GB RAM, a Core i7 Q720 1.6 GHz quad core processor, and Windows 7 x64. I am running Postgres 32 bit for compatibility with PostGIS 1.5.3. 64 bit PostGIS for Windows isn't available yet. (32 bit Postgres means it can't use more than 2 GB RAM in Windows, but I doubt that's an issue here.)
Here's the result of EXPLAIN:
Update on A (cost=727684.76..945437.01 rows=2032987 width=330)
CTE B
-> WindowAgg (cost=491007.50..542482.47 rows=2058999 width=43)
-> Sort (cost=491007.50..496155.00 rows=2058999 width=43)
Sort Key: A.L, A.N, A.M, A.P
-> Seq Scan on A (cost=0.00..85066.80 rows=2058999 width=43)
Filter: (Y = 1)
-> Hash Join (cost=185202.29..402954.54 rows=2032987 width=330)
Hash Cond: ((B.Z)::text = (A.Z)::text)
-> CTE Scan on B (cost=0.00..41179.98 rows=2058999 width=88)
-> Hash (cost=85066.80..85066.80 rows=2058999 width=266)
-> Seq Scan on A (cost=0.00..85066.80 rows=2058999 width=266)
Filter: (Y = 1)
Upvotes: 2
Views: 17499
Reputation: 4237
WITH B AS (
-- list all the columns in your table in order including the one to be "updated"
SELECT L, N, M, P,
rank() OVER (ORDER BY L, N, M, P) AS X,
Y, Z
FROM A
WHERE Y=1
), D AS (
DELETE FROM A WHERE Y=1
)
INSERT INTO A
SELECT * FROM B
I have used this a couple of times now to fix a never-ending update. The above pattern was completing for me on millions of rows in minutes.
One of those times I had primary key conflicts. That was solved by DROP
ing the primary key before running the above statement and re-CREATE
ing it afterwards. I am surprised that's necessary since in my CTE, the conflicting values are deleted before the insert happens, but oh well.
Upvotes: 1
Reputation: 1294
Try to rewrite the query like this:
UPDATE A
SET A.X = B.X
FROM B
WHERE A.Y=1
AND B.Z = A.Z
AND A.X IS DISTINCT FROM B.X;
Upvotes: 0
Reputation: 7307
There could be multiple solutions.
Upvotes: 5