Reputation: 17676
I have a SQL query I do not quite understand regarding operator precedence:
SELECT
foo,
count(*)
FROM
A
JOIN (SELECT
SUM(IF(bar = 2,1,0)) as bar_sum,
SUM(IF(foo >= 1,1,0)) as foo,
SUM(1) as sum_1
FROM
B
) as sums
GROUP BY
id,
bar_sum,
foo,
sum_1
ON A.id = B.id
does the GROUP BY
from outer brace really apply to the inner one?
Note, I need to port this SQL from Hive to Spark scala Dataframe API so I really need to get operator precedence right. From What is the execution sequence of Group By, Having and Where clause in SQL Server? it looks like in general this is true, but I did not find any documentation regarding the ()
.
Upvotes: 0
Views: 366
Reputation: 175586
This query looks incorrect (there is missing ON
clause)
SELECT foo,count(*)
FROM A
JOIN (SELECT SUM(IF(bar = 2,1,0)) as bar_sum,
SUM(IF(foo >= 1,1,0)) as foo,
SUM(1) as sum_1
FROM B) as sums -- should be `ON`
GROUP BY id, bar_sum, foo, sum_1;
-- looks like grouping by sum_1, bar_sum is superflous
The GROUP BY
is applied only to outer query. Please note that inner query will return single row.
SELECT SUM(IF(bar = 2,1,0)) as bar_sum,
SUM(IF(foo >= 1,1,0)) as foo,
SUM(1) as sum_1
FROM B
-- single row
Then you join single row to table A and get as many rows as distinct (id,foo) values.
Upvotes: 1