Anton
Anton

Reputation: 3036

Very slow update on a relatively small table in PostgreSQL

Well i have the following table(info from pgAdmin):

    CREATE TABLE comments_lemms
(
  comment_id integer,
  freq integer,
  lemm_id integer,
  bm25 real
)
WITH (
  OIDS=FALSE
);
ALTER TABLE comments_lemms OWNER TO postgres;

-- Index: comments_lemms_comment_id_idx

-- DROP INDEX comments_lemms_comment_id_idx;

CREATE INDEX comments_lemms_comment_id_idx
  ON comments_lemms
  USING btree
  (comment_id);

-- Index: comments_lemms_lemm_id_idx

-- DROP INDEX comments_lemms_lemm_id_idx;

CREATE INDEX comments_lemms_lemm_id_idx
  ON comments_lemms
  USING btree
  (lemm_id);

And one more table:

CREATE TABLE comments
(
  id serial NOT NULL,
  nid integer,
  userid integer,
  timest timestamp without time zone,
  lemm_length integer,
  CONSTRAINT comments_pkey PRIMARY KEY (id)
)
WITH (
  OIDS=FALSE
);
ALTER TABLE comments OWNER TO postgres;

-- Index: comments_id_idx

-- DROP INDEX comments_id_idx;

CREATE INDEX comments_id_idx
  ON comments
  USING btree
  (id);

-- Index: comments_nid_idx

-- DROP INDEX comments_nid_idx;

CREATE INDEX comments_nid_idx
  ON comments
  USING btree
  (nid);

in comments_lemms there are 8 million entries, in comments - 270 thousands. Im performing the following sql query:

update comments_lemms set bm25=(select lemm_length from comments where id=comment_id limit 1)

And it takes more than 20 minutes of running and i stop it because pgAdmin looks like its about to crash. Is there any way to modify this query or indexes or whatever in my database to speed up things a bit? I have to run some similar queries in future and it's quite painful to wait more than 30 minutes for each one.

Upvotes: 0

Views: 1447

Answers (1)

Denis de Bernardy
Denis de Bernardy

Reputation: 78561

in comments_lemms there are 8 million entries, in comments - 270 thousands. Im performing the following sql query:

update comments_lemms set bm25=(select lemm_length from comments where id=comment_id limit 1)

In other words, you're making it go through 8M entries, and for each row you're doing a nested loop with an index loopup. PG won't rewrite/optimize it because of the limit 1 instruction.

Try this instead:

update comments_lemms set bm25 = comments.lemm_length
from comments
where comments.id = comments_lemms.comment_id;

It should do two seq scans and hash or merge join them together, then proceed with the update in one go.

Upvotes: 1

Related Questions