Theo Hayes
Theo Hayes

Reputation: 53

Correlated sub-query causing this error - LEFT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join

When writing correlated sub-queries in BigQuery using a WHERE clause, it is resulting in an Error. LEFT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join.

However it works in other SQL engines, I am not understanding the error or how to rewrite the queries.

Different queries of the same type causing the same error.

SELECT
  out.*
FROM
  `august-monolith-205810.Assignment.students` AS out
WHERE
  2 >= (
  SELECT
    COUNT(*)
  FROM
    `august-monolith-205810.Assignment.students` AS t1
  WHERE
    out.age < t1.age)

reporting students who are older than at most two other students.

Upvotes: 5

Views: 3726

Answers (2)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173210

Below is for BigQuery Standard SQL

#standardSQL
SELECT * EXCEPT(cnt)
FROM (
  SELECT *, COUNT(1) OVER(ORDER BY age RANGE BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) cnt
  FROM `august-monolith-205810.Assignment.students` 
)
WHERE cnt BETWEEN 1 AND 2

Another, less effective and thus not recommended but still an option is to use CROSS JOIN to mimic non-equality LEFT JOIN

#standardSQL
SELECT ANY_VALUE(out).*
FROM `august-monolith-205810.Assignment.students` AS out
CROSS JOIN `august-monolith-205810.Assignment.students` AS t1
WHERE out.age > t1.age 
GROUP BY out.id
HAVING COUNT(t1.age) <= 2

Upvotes: 4

DRapp
DRapp

Reputation: 48179

Horrible thing you are trying to do, but lets think about what you DO have. For every "OUT" record, you are requerying every assignment again to get the count. What I would suggest is a 3-level query. The inner-most is to get all distinct "age" values (hoping integer based). Then get the counts LESS than to create a pre-aggregate result. Then you can join to the original out based on the outer-most OUT value vs repeated querying. Something like

select
      out
   from
      `august-monolith-205810.Assignment.students` AS out
         JOIN ( select
                      UniqAge.Age,
                      ( SELECT COUNT(*) 
                           FROM `august-monolith-205810.Assignment.students`
                           WHERE age < UniqAge.age) CountUnderAge
                   from
                      ( select distinct age
                           from `august-monolith-205810.Assignment.students` ) UniqAge
              ) PQByAge
        on out.age = PQByAge
        AND PQByAge.CountUnderAge <= 2

This query is based on the Age column being INTEGER based (non decimal).

Upvotes: 0

Related Questions