Reputation: 61
I have a Postgres 12 table with more than 20 million rows (the table is partitioned on the customer_column_id) called A and I also have a table with 2.2 million rows called B. I want to update B by selecting data from A but it sometimes is taking 3 minutes and sometimes throwing an exception while reading from stream or the operation times out. The query which I am using is below.
UPDATE B
set "udd_erhvervsfaglig_forloeb" = A.value
from A
where A.customer_column_id = 60
and B.customer_id = A.customer_id
and coalesce(A.value, '') != ''
Query Plan
-> Nested Loop (cost=0.43..15739311932.47 rows=6273088796225 width=85506)
-> Seq Scan on B(cost=0.00..3509632.93 rows=112625893 width=84980)
-> Index Scan using A_66_customer_id_idx1 on A_66 cev (cost=0.43..0.46 rows=1 width=17)
Index Cond: (customer_id = B.customer_id)
Filter: (((COALESCE(value, ''::character varying))::text <> ''::text) AND (customer_column_id = 66))```
Upvotes: 1
Views: 225
Reputation: 131
One of the ways of avoiding nested loops could be using EXISTS. So, your query can be written like this:
UPDATE B
SET "udd_erhvervsfaglig_forloeb" = A.value
FROM A
WHERE
EXISTS (SELECT 1
FROM a
WHERE B.customer_id = A.customer_id
AND A.customer_column_id = 60
AND coalesce(A.value, '') != '');
Upvotes: 0