Andrus
Andrus

Reputation: 27909

How to speed up deleting documents without rows

document headers are in omdok table:

create table omdok ( dokumnr serial primary key, ... );

document rows are in omrid table

CREATE TABLE omrid
(
  id serial NOT NULL,
  reanr serial NOT NULL,
  dokumnr integer NOT NULL,
  CONSTRAINT omrid_pkey PRIMARY KEY (id),
  CONSTRAINT omrid_dokumnr_fkey FOREIGN KEY (dokumnr)
      REFERENCES omdok (dokumnr) MATCH SIMPLE
      ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE INITIALLY IMMEDIATE,
  ....
);

Records in omdok which do not have child rows in omrid needs to be deleted

I tried

delete from omdok where dokumnr not in      (select dokumnr from omrid)

Query it is running currently 15 hours and is still running. postgres.exe is using 50% CPU all the time (this is 2 core CPU).

explain delete from omdok where dokumnr not in      (select dokumnr from omrid)

returns:

"Delete  (cost=0.00..21971079433.34 rows=220815 width=6)"
"  ->  Seq Scan on omdok  (cost=0.00..21971079433.34 rows=220815 width=6)"
"        Filter: (NOT (SubPlan 1))"
"        SubPlan 1"
"          ->  Materialize  (cost=0.00..94756.92 rows=1897261 width=4)"
"                ->  Seq Scan on omrid  (cost=0.00..77858.61 rows=1897261 width=4)"

Using

PostgreSQL 9.0.1, compiled by Visual C++ build 1500, 64-bit

Windows 2003 x64 server with 4 GB RAM.

Upvotes: 0

Views: 69

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1269493

Another alternative is to simply create an index on omrid(dokumnr):

create index idx_omrid_dokumnr on omrid(dokumnr);

This should speed the processing of the not in in your original query.

Upvotes: 0

Patrick
Patrick

Reputation: 32161

You do not qualify your DELETE query in any way so the 220,815 documents are compared against the 1,897,261 document rows. That will take time.

The simplest optimization is using a DISTINCT clause on the document rows, which should bring the rows from omrid down by a factor 8 or so:

DELETE FROM omdok WHERE dokumnr NOT IN (SELECT DISTINCT dokumnr FROM omrid);

A potentially faster solution is to first identify the documents without rows, then delete those rows:

WITH docs0rows AS (
  SELECT dokumnr
  FROM omdok d
  LEFT JOIN (SELECT DISTINCT dokumnr FROM omrid) dr ON dr.dokumnr = d.dokumnr
  WHERE dr.dokumnr IS NULL
)
DELETE FROM omdok d
USING docs0rows zero
WHERE d.dokumnr = zero.dokumnr;

Disclaimer: Test this command before you run it to see which rows will be deleted.

Upvotes: 1

Related Questions