Reputation: 173
How do I do an EXCEPT
clause (like SQL) in Hive QL
I have 2 tables, and each table is a column of unique ids.
I want to find the list of ids that are only in table 1 but not in table 2
Table 1
apple
orange
pear
Table 2
apple
orange
In SQL you can do an EXCEPT clause (http://en.wikipedia.org/wiki/Set_operations_%28SQL%29) but you can't do that in Hive QL
Upvotes: 14
Views: 30453
Reputation: 1401
1:
select distinct id from table1 where id not in (select distinct id from table2)
2:
select t1.id
from table1 as t1
left join table2 as t2
on t1.id = t2.id
where t2.id is null
Upvotes: 0
Reputation: 139
We can use NOT EXISTS clause in Hive as MINUS equivalent.
SELECT t1.id FROM t1 WHERE NOT EXISTS (SELECT 1 from t2 WHERE t2.id = t1.id);
Upvotes: 2
Reputation: 726
I don't think there's any built-in way to do this but a LEFT OUTER JOIN
should do the trick.
This selects all Ids from table1
that do not exist in table2
:
SELECT t1.id FROM table1 t1 LEFT OUTER JOIN table2 t2 ON (t1.id=t2.id) WHERE t2.id IS NULL;
Upvotes: 28