dMb
dMb

Reputation: 9337

postgres - slow outer query when using a primary key

Using Postgres v. 8.4.2.

I have a standard three-table setup for a many-to-many relationship. A query, written against two of these tables, is supposed to return a single record for which there is no existing mapping:

SELECT b.id
   FROM scm_branch b
      LEFT OUTER JOIN "scm_branchgroup_branches" bgb
         ON b.id = bgb.branch_id
   WHERE
         bgb.branch_id is NULL
   LIMIT 1

(scm_branchgroup_branches is the ternary table keeping the mapping records)

this query has the following top level cost:

Limit (cost=0.00..0.22 rows=1 width=4)

However if I change the field used in the WHERE condition to be the primary key of the bgb table, the cost goes drastically up:

SELECT b.id
   FROM scm_branch b
      LEFT OUTER JOIN "scm_branchgroup_branches" bgb
         ON b.id = bgb.branch_id
   WHERE
         bgb.id IS NULL 
   LIMIT 1

(note: this query uses "bgb.id IS NULL" vs "bgb.branch_id is NULL" used in the previous example)

Top level cost:

Limit (cost=236366.74..4644900.75 rows=1 width=4)

Why such a drastic difference? I mean I see the difference in execution plans, but I don't understand the underlying reasons for the difference.

I'll be happy to provide any additional info needed to answer this question.

Thanks D.

Upvotes: 1

Views: 424

Answers (1)

ruakh
ruakh

Reputation: 183321

In the version with branch_id IS NULL, PostgreSQL recognizes that you're doing an anti-join, and it optimizes accordingly, using a "hash-join"-type algorithm. (Google PostgreSQL hash left anti-join for lots of information on this optimization.) In the version with id IS NULL, however, it doesn't recognize this fact, and its optimizations are not nearly so helpful. What's more, the join basically forces it to use the index on branch_id, so it can't take advantage of the primary-key index on id, and must consult the actual table data to determine if a given record satisfies the WHERE-clause.

(There may well be other factors at play as well — I'm not an expert on PostgreSQL — but I believe those are the main factors.)

Upvotes: 1

Related Questions