Reputation: 127
I have two hive tables -
Table1: col1 col2 --------- 1 b 2 c 3 d
Table2: col1 col2 col3 ---------------- 1 b x 1 b y 1 b z 2 c x 3 d x
Here as you can see in Table 2 I have duplicate values for col1 and col2. I want to remove these duplicates when I do select. I dont care about col3.
I am looking for select query which will give me -
col1 col2 col3 ---------------- 1 b y 2 c x 3 d x
I tried using different JOINs but end up getting duplicate values.
Upvotes: 2
Views: 2134
Reputation: 31648
use GROUP BY
along with JOIN
SELECT a.col1,
a.col2,
MAX(b.col3) AS col3
FROM Table1 a
JOIN Table2 b
ON a.col1 = b.col1
AND a.col2 = b.col2
GROUP BY a.col1,
a.col2;
I don't care about col3.
So, if you don't want it in output, you could simply use EXISTS
SELECT a.col1,
a.col2
FROM table1 a
WHERE EXISTS (SELECT 1
FROM table2 b
WHERE a.col1 = b.col1);
Upvotes: 2