queroga_vqz
queroga_vqz

Reputation: 1049

Update using join on big table - performance tips?

Been struggling with this update, that never finishes

update votings v
set voter_id = (select pv.number from voters pv WHERE pv.person_id = v.person_id);

Table being currently has 96M records

select count(0) from votings;
  count   
----------
 96575239
(1 registro)

Update apparently is using index

explain update votings v                             
set voter_id = (select pv.number from voters pv WHERE pv.rl_person_id = v.person_id);
                                                    QUERY PLAN                                                     
-------------------------------------------------------------------------------------------------------------------
 Update on votings v  (cost=0.00..788637465.40 rows=91339856 width=1671)
   ->  Seq Scan on votings v  (cost=0.00..788637465.40 rows=91339856 width=1671)
         SubPlan 1
           ->  Index Scan using idx_voter_rl_person_id on voters pv  (cost=0.56..8.58 rows=1 width=9)
                 Index Cond: (rl_person_id = v.person_id)
(5 registros)

Here are the indexes I have for votings

Índices:
    "votings_pkey" PRIMARY KEY, btree (id)
    "votings_election_id_voter_id_key" UNIQUE CONSTRAINT, btree (election_id, person_id)
    "votings_external_id_external_source_key" UNIQUE CONSTRAINT, btree (external_id, external_source)
    "idx_votings_updated_at" btree (updated_at DESC)
    "idx_votings_vote_party" btree (vote_party)
    "idx_votings_vote_state_vote_party" btree (vote_state, vote_party)
    "idx_votings_voter_id" btree (person_id)
Restrições de chave estrangeira:
    "votings_election_id_fkey" FOREIGN KEY (election_id) REFERENCES elections(id)
    "votings_voter_id_fkey" FOREIGN KEY (person_id) REFERENCES people_all(id)

Guys, any ideia who plays the biggest part on the update running slowly? the number of rows or the join being used?

Upvotes: 0

Views: 1025

Answers (3)

Gordon Linoff
Gordon Linoff

Reputation: 1270573

Updating all the rows in the table is going to be really expensive. I would suggest re-creating the table:

create temp_votings as
    select v.*, vv.vote_id
    from votings v join
         voters vv
         on vv.person_id = v.person_id;

For this query, you want an index on votes(person_id, vote_id). I am guessing that person_id might already be the primary key; if so, no additional index is needed.

Then, you can replace the existing table -- but back it up first:

truncate table votings;

insert into votings ( . . . )    -- list columns here
    select . . .                 -- and the same columns here
    from temp_votings;

Upvotes: 2

Tim Biegeleisen
Tim Biegeleisen

Reputation: 522226

One suggestion I can make here would be to use a covering index for the subquery lookup:

CREATE INDEX idx_cover ON voters (person_id, number);

While in the context of a select this might not advantage much over your current index on person_id alone, in the context of an update it might matter more. The reason is that for an update this index might relieve Postgres from having to create and maintain a copy of the original table in its state before the update.

Upvotes: 1

Laurenz Albe
Laurenz Albe

Reputation: 247400

If you actually have 91339856 rows in voting, the 91339856 index scans on voters are certainly the dominant cost factor. The sequential scan will be faster.

You can probably boost performance if you don't force PostgreSQL to do a nested loop join:

UPDATE votings
SET voter_id = voters.number
FROM voters
WHERE votings.person_id = voters.person_id;

Upvotes: 1

Related Questions