Reputation: 95
I have two temporary tables table_a
and table_b
and trying to get this query and all of its conditions to work properly.
SELECT DISTINCT CASE WHEN a.id IS NULL THEN b.id ELSE a.id END id,
CASE WHEN a.num IS NULL THEN b.num ELSE a.num END num,
CASE WHEN a.testdate IS NULL THEN b.testdate ELSE a.testdate END testdate
FROM table_a a
FULL OUTER JOIN table_b b
ON (a.id=b.id AND a.num=b.num AND a.testdate=b.testdate)
WHERE
(CASE WHEN a.t_amt IS NULL THEN 0 ELSE a.t_amt END)
<>
(CASE WHEN b.t_amt IS NULL THEN 0 ELSE b.t_amt END) OR
(CASE WHEN a.qty IS NULL THEN 0 ELSE a.qty END)
<>
(CASE WHEN b.qty IS NULL THEN 0 ELSE b.qty END)
ORDER BY
CASE WHEN a.id IS NULL THEN b.id ELSE a.id END,
CASE WHEN a.num IS NULL THEN b.num ELSE a.num END,
CASE WHEN a.testdate IS NULL THEN b.testdate ELSE a.testdate END
Executing the above query on these two tables using SparkSQL
produces the following error
sqlq = <the sql from above>
df = sqlContext.sql(sqlq)
"AnalysisException: u"cannot resolve '
a.id
' given input columns: [id, num, testdate];"
Upvotes: 1
Views: 2412
Reputation: 3848
It seems your error is in the ORDER BY
clause, as it has no concept of tables a
and b
, but only the names and aliases in the SELECT
clause.
This makes quite good sense since you should really only order your results based on columns actually in the result set.
SELECT DISTINCT (CASE WHEN a.id IS NULL THEN b.id ELSE a.id END) AS id,
(CASE WHEN a.num IS NULL THEN b.num ELSE a.num END) AS num,
(CASE WHEN a.testdate IS NULL THEN b.testdate ELSE a.testdate END) AS testdate
FROM table_a AS a
FULL OUTER JOIN table_b AS b
ON (a.id=b.id AND a.num=b.num AND a.testdate=b.testdate)
WHERE
(CASE WHEN a.t_amt IS NULL THEN 0 ELSE a.t_amt END) <> (CASE WHEN b.t_amt IS NULL THEN 0 ELSE b.t_amt END)
OR
(CASE WHEN a.qty IS NULL THEN 0 ELSE a.qty END) <> (CASE WHEN b.qty IS NULL THEN 0 ELSE b.qty END)
ORDER BY id, num, testdate
Upvotes: 1