Reputation: 2316
I have two queries running in BQ. I expect them having the same results, but it turns out not. I don't understand why it happens.
Query 1:
select a.* from
(
select *, _PARTITIONTIME as pt from
`table_A`
where
p > 5 and q != 0
) as a
left join
(select distinct x, pt from `table_b`) as b
on join_clauses
Query 2:
select a.* from
table_A as a
left join
(select distinct x, pt from table_b) as b
on join_clauses
and a.p > 5 and a.q != 0
If I remove the conditions of a.p > 5 and a.q != 0
from both queries, the results are identical.
As far as I know, the conditions under the on
statements is evaluated before the join, which should make the two queries equal. Then I don't know why do I see different results here. Is there any special setup in Google's BQ?
Upvotes: 0
Views: 1556
Reputation: 173003
In your Query#2 - result will contain ALL rows from table_A
being left join to second sub-query. So, for each row in table_a
output will have as many respective rows as many matches of on join_clauses and a.p > 5 and a.q != 0
exists in second sub-query - BUT at least one (in case if no matches for that row - this is because of LEFT JOIN used)
In Query #1 - first you filter out from table_a
all rows which don't match WHERE clause p > 5 and q != 0
and only then you apply on join_clauses
so you start with subset of rows in table_a
thus the final output has lower rows than for Query #2
Run these two queries which are simplified version of yours to see this
Analogue of Query #2
#standardSQL
WITH table_a AS (
SELECT 1 id, 1 p, 0 q UNION ALL
SELECT 1, 6, 1 UNION ALL
SELECT 2, 7, 2
), table_b AS (
SELECT 1 id UNION ALL
SELECT 2
)
SELECT a.*, b.id idb
FROM table_a a
LEFT JOIN table_b b
ON a.id = b.id
AND a.p > 5 AND a.q != 0
with result
Row id p q idb
1 1 1 0 null
2 1 6 1 1
3 2 7 2 2
and analogue of Query #1
#standardSQL
WITH table_a AS (
SELECT 1 id, 1 p, 0 q UNION ALL
SELECT 1, 6, 1 UNION ALL
SELECT 2, 7, 2
), table_b AS (
SELECT 1 id UNION ALL
SELECT 2
)
SELECT a.*, b.id idb
FROM (
SELECT * FROM table_a
WHERE p > 5 AND q != 0
) a
LEFT JOIN table_b b
ON a.id = b.id
with result
Row id p q idb
1 1 6 1 1
2 2 7 2 2
comparing those two - you can easily see the difference in result and hopefully see now why :o)
Upvotes: 1
Reputation: 1269953
This has nothing to do with BigQuery, only with your understanding of LEFT JOIN
.
A LEFT JOIN
keeps all rows in the first table, regardless of whether the ON
clause evaluates to true, false, or NULL
.
That means that filters on the first table have no impact in the FROM
clause. Well, they do have an impact -- otherwise matching rows from the second table will have NULL
values. However, hardly anyone notices or desires that effect.
The rules for outer joins are simple:
LEFT JOIN
, filters on the first table go in the WHERE
clause.LEFT JOIN
, filters on subsequent tables go in the ON
clause (otherwise the outer join becomes an inner join).RIGHT JOIN
. . . replace with LEFT JOIN
. (This can usually be done and most people find the query to be easier to follow.)Upvotes: 0