Yunus Eren Güzel
Yunus Eren Güzel

Reputation: 3088

How to write efficient UPDATE-SELECT sql

I have written an sql for a table that have about 50.000.000 users. The query takes too much time than I expected that it did not finished about 23 hours.

UPDATE users
    SET building_id = B.id
    FROM (
      SELECT *
      FROM buildings B
    ) AS B
    WHERE B.city          = address_city
      AND B.town          = address_town
      AND B.neighbourhood = address_neighbourhood
      AND B.street        = address_street
      AND B.no            = address_building_no

The idea of this sql is to that removing the building/address info from users and instead referencing it to the buildings table.

EXPLAIN

Update on users  (cost=22226900.43..22548054.14 rows=15212 width=166) 
->  Merge Join  (cost=22226900.43..22548054.14 rows=15212 width=166)
         Merge Cond: (((users.address_city)::text = (b.city)::text) AND ((users.address_town)::text = (b.town)::text) AND ((users.address_neighbourhood)::text = (b.neighbourhood)::text) AND ((users.address_street)::text = (b.street)::text) AND ((users.address_building_no)::text = (b.no)::text))
         ->  Sort  (cost=21352886.76..21401078.96 rows=96384398 width=156)
               Sort Key: users.address_city, users.address_town, users.address_neighbourhood, users.address_street, users.address_building_no
               ->  Seq Scan on users  (cost=0.00..2559921.19 rows=96384398 width=156)
         ->  Materialize  (cost=874013.68..883606.86 rows=9593179 width=63)
               ->  Sort  (cost=874013.68..878810.27 rows=9593179 width=63)
                     Sort Key: b.city, b.town, b.neighbourhood, b.street, b.no
                     ->  Seq Scan on buildings b  (cost=0.00..136253.54 rows=9593179 width=63) (10 rows)

I don't know whether this sql uses the inner SELECT sql for each of the users or caches for transaction. Also, if it caches, does it use the indexes for the cached temp table?

I couldn't write the sql like this:

FROM (
  SELECT * 
  FROM buildings B
  WHERE B.city          = users.address_city
    AND B.town          = users.address_town
    AND B.neighbourhood = users.address_neighbourhood
    AND B.street        = users.address_street
    AND B.no            = users.address_building_no
  )

it says that users cannot be accessed from the inner select. Do you have any suggestion how to access buildings in the inner sql statement.

Upvotes: 1

Views: 129

Answers (2)

Vao Tsun
Vao Tsun

Reputation: 51519

I presume

create table t as select column_list from a join b on column=column;
alter table t rename to users;

will be faster, and will produce microsecond lock only... Of course if a table is not editable at the moment and there is enough space in temp_tablespace

Upvotes: 1

Robins Tharakan
Robins Tharakan

Reputation: 2473

Not sure but wouldn't this be (at least slightly, if not considerably) faster?

UPDATE users
SET building_id = B.id
FROM buildings B
WHERE B.city          = address_city
  AND B.town          = address_town
  AND B.neighbourhood = address_neighbourhood
  AND B.street        = address_street
  AND B.no            = address_building_no

If nothing else, it wouldn't require the Materialize stage as given in EXPLAIN above.

Upvotes: 0

Related Questions