Reputation: 4618
I have 2 tables in redshift:
table1
| ids |
|------:|
| 1 |
| 2 |
| 6 |
| 9 |
| 12 |
table2
| id | value |
|-----:|---------:|
| 1 | 0.134435 |
| 2 | 0.767417 |
| 3 | 0.779567 |
| 4 | 0.726051 |
| 5 | 0.405138 |
| 6 | 0.775206 |
| 7 | 0.699945 |
| 8 | 0.499433 |
| 10 | 0.457386 |
| 9 | 0.227511 |
| 10 | 0.369292 |
| 11 | 0.653735 |
| 12 | 0.537251 |
| 2 | 0.953539 |
| 13 | 0.377625 |
| 14 | 0.973905 |
| 4 | 0.104643 |
| 1 | 0.450627 |
And I basically want to get the rows in table2 where id is in table1 and I have 2 possibilities:
SELECT *
FROM table2
WHERE id IN (SELECT ids FROM table1)
or
SELECT t2.id, t2.value
FROM table2 t2
INNER JOIN table1 t1
ON t2.id = t1.ids
I want to know if there is any performance difference between them.
(I know I could just test in this example to find out but I would like to know if there is one which is always faster)
Edit: table1.ids
is a unique column
Upvotes: 2
Views: 2403
Reputation: 1270713
The two queries do different things.
The JOIN
can multiply the number of rows if id
is duplicated in table1
.
The IN
will never duplicate rows.
If id
can be duplicated, you should use the version that does what you want. If id
is guaranteed to be unique, then the two are functionally equivalent.
In my experience, JOIN
is typically at least as fast a IN
. Of course, you can test on your data, but that is a starting point.
Upvotes: 4