Reputation: 101
In BigQuery, what's the most efficient way (from performance standpoint) to complete the following task?
Objective: Select rows from table_a that match with US-based rows in table_b.
I see at least three different ways to go about this task.
1) Using a subquery to filter
SELECT * FROM table_a
JOIN (select * from table_b where country='US') table_b
ON table_a.userid = table_b.userid
2) Using join clauses to filter
SELECT * FROM table_a
JOIN table_b
ON table_a.userid = table_b.userid
AND table_b.country='US'
3) Adding where clause at end
SELECT * FROM table_a
JOIN table_b
ON table_a.userid = table_b.userid
WHERE table_b.country='US'
Upvotes: 1
Views: 6587
Reputation: 31
The query with a subquery will be the slowest, then the 3rd query and the most optimised one is the 2nd one as it only uses join whereas the 3rd query uses both join and a WHERE clause.
You can refer to these stackoverflow posts, to get a deeper understanding. Difference between joins and subquery : Join vs. sub-query Difference between joins and where clause : Is a JOIN faster than a WHERE?
Hope this helps !
Upvotes: 0
Reputation: 172994
In your case - I would go with below to narrow down JOIN volume as early as possible. Even though BigQuery Engine is smart enought to come up with this optimization by its own for you no matter which version (of query out of three in question) you use
SELECT *
FROM table_a
JOIN (SELECT * FROM table_b WHERE country='US') table_b
ON table_a.userid = table_b.userid
but if you need only rows from table_a - I would go with something like
SELECT a.*
FROM table_a a
JOIN (SELECT DISTINCT userid FROM table_b WHERE country='US') table_b
ON a.userid = table_b.userid
Upvotes: 0
Reputation: 33705
If you use standard SQL for these queries, then they will all execute in the same way, which you can confirm by looking at the query plan explanation after executing them. Specifically, BigQuery applies the following transformations:
INNER JOIN
s, so putting the WHERE
in a subquery prior to the join is analogous to having it appear outside the join.ON
clause, so ON table_b.country='US'
is treated the same as WHERE table_b.country='US'
.If you use legacy SQL instead, however, then you need to move the condition inside the subquery prior to the join, since BigQuery does not "push down" filters through joins when using legacy SQL.
Upvotes: 5