Reputation: 1049
Been struggling with this update, that never finishes
update votings v
set voter_id = (select pv.number from voters pv WHERE pv.person_id = v.person_id);
Table being currently has 96M records
select count(0) from votings;
count
----------
96575239
(1 registro)
Update apparently is using index
explain update votings v
set voter_id = (select pv.number from voters pv WHERE pv.rl_person_id = v.person_id);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Update on votings v (cost=0.00..788637465.40 rows=91339856 width=1671)
-> Seq Scan on votings v (cost=0.00..788637465.40 rows=91339856 width=1671)
SubPlan 1
-> Index Scan using idx_voter_rl_person_id on voters pv (cost=0.56..8.58 rows=1 width=9)
Index Cond: (rl_person_id = v.person_id)
(5 registros)
Here are the indexes I have for votings
Índices:
"votings_pkey" PRIMARY KEY, btree (id)
"votings_election_id_voter_id_key" UNIQUE CONSTRAINT, btree (election_id, person_id)
"votings_external_id_external_source_key" UNIQUE CONSTRAINT, btree (external_id, external_source)
"idx_votings_updated_at" btree (updated_at DESC)
"idx_votings_vote_party" btree (vote_party)
"idx_votings_vote_state_vote_party" btree (vote_state, vote_party)
"idx_votings_voter_id" btree (person_id)
Restrições de chave estrangeira:
"votings_election_id_fkey" FOREIGN KEY (election_id) REFERENCES elections(id)
"votings_voter_id_fkey" FOREIGN KEY (person_id) REFERENCES people_all(id)
Guys, any ideia who plays the biggest part on the update running slowly? the number of rows or the join being used?
Upvotes: 0
Views: 1025
Reputation: 1270573
Updating all the rows in the table is going to be really expensive. I would suggest re-creating the table:
create temp_votings as
select v.*, vv.vote_id
from votings v join
voters vv
on vv.person_id = v.person_id;
For this query, you want an index on votes(person_id, vote_id)
. I am guessing that person_id
might already be the primary key; if so, no additional index is needed.
Then, you can replace the existing table -- but back it up first:
truncate table votings;
insert into votings ( . . . ) -- list columns here
select . . . -- and the same columns here
from temp_votings;
Upvotes: 2
Reputation: 522226
One suggestion I can make here would be to use a covering index for the subquery lookup:
CREATE INDEX idx_cover ON voters (person_id, number);
While in the context of a select this might not advantage much over your current index on person_id
alone, in the context of an update it might matter more. The reason is that for an update this index might relieve Postgres from having to create and maintain a copy of the original table in its state before the update.
Upvotes: 1
Reputation: 247400
If you actually have 91339856 rows in voting
, the 91339856 index scans on voters
are certainly the dominant cost factor. The sequential scan will be faster.
You can probably boost performance if you don't force PostgreSQL to do a nested loop join:
UPDATE votings
SET voter_id = voters.number
FROM voters
WHERE votings.person_id = voters.person_id;
Upvotes: 1