Reputation: 91

Select non duplicate records from a hive join query

I have the following Hive query:

select *
from A
left outer join B
on A.ID = B.ID
where B.ID IS NULL

The result produces duplicate data but I need only non-duplicate records.

After some research, I tried the below query:

select *
from (
    select *
    from A
    left outer join on B
    where A.ID = B.ID AND B.ID IS NULL ) join_result
group by jojn_result.ID

It's showing an ambiguous column reference ID error.

I do not have the columns name of table A.

Please help me to identify the solution to this .

Thank you .

Upvotes: 1

Answers (2)

Jaysheel Kalgal

Reputation: 46

One of your join columns may have NULL values. Whenever there is NULL in any of the join key values, it will skip that column. Try replacing the NULL with some default value while joining using NVL or COALESCE. I was looking for same answer and saw your post here. But there was no solution. But since i found the solution I just wanted to post here so that someone can benefit.

select *
from A
left outer join B
on coalesce(A.ID,000) = coalesce(B.ID,000)
where B.ID IS NULL

Upvotes: 0

Gordon Linoff

Reputation: 1269773

Hmmm . . . How about select:

Select A.*
from A left outer join
     B
     on A.ID = B.ID
where B.ID IS NULL;

I removed the B columns because they are not needed.

Upvotes: 1

Select non duplicate records from a hive join query

Answers (2)

Related Questions