Reputation: 53
When writing correlated sub-queries in BigQuery using a WHERE clause, it is resulting in an Error. LEFT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join.
However it works in other SQL engines, I am not understanding the error or how to rewrite the queries.
Different queries of the same type causing the same error.
SELECT
out.*
FROM
`august-monolith-205810.Assignment.students` AS out
WHERE
2 >= (
SELECT
COUNT(*)
FROM
`august-monolith-205810.Assignment.students` AS t1
WHERE
out.age < t1.age)
reporting students who are older than at most two other students.
Upvotes: 5
Views: 3726
Reputation: 173210
Below is for BigQuery Standard SQL
#standardSQL
SELECT * EXCEPT(cnt)
FROM (
SELECT *, COUNT(1) OVER(ORDER BY age RANGE BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) cnt
FROM `august-monolith-205810.Assignment.students`
)
WHERE cnt BETWEEN 1 AND 2
Another, less effective and thus not recommended but still an option is to use CROSS JOIN to mimic non-equality LEFT JOIN
#standardSQL
SELECT ANY_VALUE(out).*
FROM `august-monolith-205810.Assignment.students` AS out
CROSS JOIN `august-monolith-205810.Assignment.students` AS t1
WHERE out.age > t1.age
GROUP BY out.id
HAVING COUNT(t1.age) <= 2
Upvotes: 4
Reputation: 48179
Horrible thing you are trying to do, but lets think about what you DO have. For every "OUT" record, you are requerying every assignment again to get the count. What I would suggest is a 3-level query. The inner-most is to get all distinct "age" values (hoping integer based). Then get the counts LESS than to create a pre-aggregate result. Then you can join to the original out based on the outer-most OUT value vs repeated querying. Something like
select
out
from
`august-monolith-205810.Assignment.students` AS out
JOIN ( select
UniqAge.Age,
( SELECT COUNT(*)
FROM `august-monolith-205810.Assignment.students`
WHERE age < UniqAge.age) CountUnderAge
from
( select distinct age
from `august-monolith-205810.Assignment.students` ) UniqAge
) PQByAge
on out.age = PQByAge
AND PQByAge.CountUnderAge <= 2
This query is based on the Age column being INTEGER based (non decimal).
Upvotes: 0