Reputation: 4070

Postgresql select where one of two columns in set

My tables :

R1(id serial, a int , b int) - 100k records
R2(id,serial,VAL unique) - 40k records

Indexes : r2(VAL),r1(a),r1(b)

My goal : get all records from R1 where a or b equals VAL in R2

I thought about 2 options :

select * from R1 where R1.a in (select VAL from R2) or R1.b in (select VAL from R2);

select * from R1 where exists (select * from R2 where R1.a = R2.VAL or R1.b = R2.VAL);

In my specific case:

The first query uses seq scan on R1 and hash scan on r2.
The second scan uses seq scan on R1 and bitmap scan on r2 with the index R2(VAL).

I wanted to hear if I can rewrite the query in a better way in order to gain performance. By the way I'm using pg 9.6

Thanks..

Upvotes: 0

Answers (4)

JeyJ

Reputation: 4070

It seems like in this specific case the first query I suggested in my main comment provides the best performance :

select * from R1 where R1.a in (select VAL from R2) or R1.b in (select VAL from R2);

@klin prepared a dbfiddle with all the execution plans if someone want to have a look : https://dbfiddle.uk/?rdbms=postgres_9.5&fiddle=5b50f605262e406c0bab13710ec121d5

Upvotes: 0

forpas

Reputation: 164099

Use a CTE that returns all the VALs from R2 (so you scan R2 only once).
But maybe this is not needed if the optimizer does it already for select VAL from R2.
Then use CASE to check the 2 separate cases, so the 2nd case (WHEN b IN (SELECT VAL FROM cte)) will be evaluated only when the 1st case (WHEN a IN (SELECT VAL FROM cte)) fails.

WITH cte AS (SELECT VAL FROM R2)
SELECT * FROM R1
WHERE 1 = CASE
  WHEN a IN (SELECT VAL FROM cte) THEN 1
  WHEN b IN (SELECT VAL FROM cte) THEN 1
  ELSE 0
END

Upvotes: 0

Gordon Linoff

Reputation: 1269873

I would recommend:

select r1.*
from R1
where exists (select 1 from r2 where r1.a = r2.val) or
      exists (select 1 from r2 where r1.b = r2.val);

Then, you want an index on r2(val).

If r2 is really small and r1 is really big and there are separate indexes on r1(a) and r1(b), then this might be faster:

select r1.*
from r1 join
     r2
     on r1.a = r2.val
union
select r1.*
from r1 join
     r2
     on r1.b = r2.val;

Upvotes: 1

MMV

Reputation: 980

And how about

select * from r1 join r2 on r1.a=r2.val
union
select * from r1 join r2 on r1.b=r2.val

Upvotes: 0

Postgresql select where one of two columns in set

Answers (4)

Related Questions