Reputation: 2668
I have two tables A and B that both have a column id. I wish to obtain ids from A that are not present in B. The obvious way is:
SELECT id FROM A WHERE id NOT IN (SELECT id FROM B)
Unfortunately, Hive doesn't support in, exists or subqueries. Is there a way to achieve the above using joins?
I thought of the following
SELECT A.id FROM A,B WHERE A.id<>B.id
But it seems like this will return the entirety of A, since there always exists an id in B that is not equal to any id in A.
Upvotes: 11
Views: 48221
Reputation: 11
if you can using spark sql you can use left anti join.
ex: SELECT A.id FROM A left anti join B on a.id=b.id
Upvotes: 1
Reputation: 81
Hive seems to support IN
, NOT IN
, EXIST
and NOT EXISTS
from 0.13.
select count(*)
from flight a
where not exists(select b.tailnum from plane b where b.tailnum = a.tailnum);
The subqueries in EXIST
and NOT EXISTS
should have correlated predicates (like b.tailnum = a.tailnum
in above sample)
For more, refer Hive Wiki > Subqueries in the WHERE Clause
Upvotes: 6
Reputation: 91
Should you ever want to do an IN
as so:
SELECT id FROM A WHERE id IN (SELECT id FROM B)
Hive has this covered with a LEFT SEMI JOIN
:
SELECT a.key, a.val
FROM a LEFT SEMI JOIN b on (a.key = b.key)
Upvotes: 3
Reputation: 41428
You can do the same with a LEFT OUTER JOIN
in Hive:
SELECT A.id
FROM A
LEFT OUTER JOIN B
ON (B.id = A.id)
WHERE B.id IS null
Upvotes: 26