Reputation: 3036
Well i have the following table(info from pgAdmin):
CREATE TABLE comments_lemms
(
comment_id integer,
freq integer,
lemm_id integer,
bm25 real
)
WITH (
OIDS=FALSE
);
ALTER TABLE comments_lemms OWNER TO postgres;
-- Index: comments_lemms_comment_id_idx
-- DROP INDEX comments_lemms_comment_id_idx;
CREATE INDEX comments_lemms_comment_id_idx
ON comments_lemms
USING btree
(comment_id);
-- Index: comments_lemms_lemm_id_idx
-- DROP INDEX comments_lemms_lemm_id_idx;
CREATE INDEX comments_lemms_lemm_id_idx
ON comments_lemms
USING btree
(lemm_id);
And one more table:
CREATE TABLE comments
(
id serial NOT NULL,
nid integer,
userid integer,
timest timestamp without time zone,
lemm_length integer,
CONSTRAINT comments_pkey PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
ALTER TABLE comments OWNER TO postgres;
-- Index: comments_id_idx
-- DROP INDEX comments_id_idx;
CREATE INDEX comments_id_idx
ON comments
USING btree
(id);
-- Index: comments_nid_idx
-- DROP INDEX comments_nid_idx;
CREATE INDEX comments_nid_idx
ON comments
USING btree
(nid);
in comments_lemms there are 8 million entries, in comments - 270 thousands. Im performing the following sql query:
update comments_lemms set bm25=(select lemm_length from comments where id=comment_id limit 1)
And it takes more than 20 minutes of running and i stop it because pgAdmin looks like its about to crash. Is there any way to modify this query or indexes or whatever in my database to speed up things a bit? I have to run some similar queries in future and it's quite painful to wait more than 30 minutes for each one.
Upvotes: 0
Views: 1447
Reputation: 78561
in comments_lemms there are 8 million entries, in comments - 270 thousands. Im performing the following sql query:
update comments_lemms set bm25=(select lemm_length from comments where id=comment_id limit 1)
In other words, you're making it go through 8M entries, and for each row you're doing a nested loop with an index loopup. PG won't rewrite/optimize it because of the limit 1
instruction.
Try this instead:
update comments_lemms set bm25 = comments.lemm_length
from comments
where comments.id = comments_lemms.comment_id;
It should do two seq scans and hash or merge join them together, then proceed with the update in one go.
Upvotes: 1