Is it faster to do WHERE IN or INNER JOIN in Redshift

Question

I have 2 tables in redshift:

table1

|   ids |
|------:|
|     1 |
|     2 |
|     6 |
|     9 |
|    12 |

table2

|   id |    value |
|-----:|---------:|
|    1 | 0.134435 |
|    2 | 0.767417 |
|    3 | 0.779567 |
|    4 | 0.726051 |
|    5 | 0.405138 |
|    6 | 0.775206 |
|    7 | 0.699945 |
|    8 | 0.499433 |
|   10 | 0.457386 |
|    9 | 0.227511 |
|   10 | 0.369292 |
|   11 | 0.653735 |
|   12 | 0.537251 |
|    2 | 0.953539 |
|   13 | 0.377625 |
|   14 | 0.973905 |
|    4 | 0.104643 |
|    1 | 0.450627 |

And I basically want to get the rows in table2 where id is in table1 and I have 2 possibilities:

SELECT *
FROM table2
WHERE id IN (SELECT ids FROM table1)

or

SELECT t2.id, t2.value
FROM table2 t2
INNER JOIN table1 t1
ON t2.id = t1.ids

I want to know if there is any performance difference between them.

(I know I could just test in this example to find out but I would like to know if there is one which is always faster)

Edit: table1.ids is a unique column

Gordon Linoff · Accepted Answer

The two queries do different things.

The JOIN can multiply the number of rows if id is duplicated in table1.

The IN will never duplicate rows.

If id can be duplicated, you should use the version that does what you want. If id is guaranteed to be unique, then the two are functionally equivalent.

In my experience, JOIN is typically at least as fast a IN. Of course, you can test on your data, but that is a starting point.

Is it faster to do WHERE IN or INNER JOIN in Redshift

Answers (1)

Related Questions