Reputation: 91
I have the following Hive query:
select *
from A
left outer join B
on A.ID = B.ID
where B.ID IS NULL
The result produces duplicate data but I need only non-duplicate records.
After some research, I tried the below query:
select *
from (
select *
from A
left outer join on B
where A.ID = B.ID AND B.ID IS NULL ) join_result
group by jojn_result.ID
It's showing an ambiguous column reference ID error.
I do not have the columns name of table A.
Please help me to identify the solution to this .
Thank you .
Upvotes: 1
Views: 1627
Reputation: 46
One of your join columns may have NULL
values. Whenever there is NULL
in any of the join key values, it will skip that column. Try replacing the NULL
with some default value while joining using NVL
or COALESCE
. I was looking for same answer and saw your post here. But there was no solution. But since i found the solution I just wanted to post here so that someone can benefit.
select *
from A
left outer join B
on coalesce(A.ID,000) = coalesce(B.ID,000)
where B.ID IS NULL
Upvotes: 0
Reputation: 1269773
Hmmm . . . How about select
:
Select A.*
from A left outer join
B
on A.ID = B.ID
where B.ID IS NULL;
I removed the B
columns because they are not needed.
Upvotes: 1