Rishabh Dixit
Rishabh Dixit

Reputation: 115

Hive - Select distinct unique IDs per date without creating external tables or using JOINS

I am working on a data set which has the following columns :

unique_ID       Date
a               2018_09_08
a               2018_09_18
a               2018_09_28
d               2018_09_08

I am looking to select those Unique_IDs which are occurring on all three dates i.e 2018_09_08, 2018_09_18 and 2018_09_28.

My output should be just 'a'.

There is a long solution to this problem - Extract unique_IDs per date and create external table on top of all three of them and then use join on three tables to get unique IDs for all three dates. I believe there should be a better solution as we have just 3 dates in this case which might rise later so I am looking for a more generalized solution.

Here is the query that I have written - select distinct(unique_ID) from table_name where Date = '2018_09_08' and Date = '2018_09_18' and Date = '2018_09_28' which is returning null.

I am also trying to write a sub-query but I doubt HIVE supports such sub queries in this case. Here is what I have written :

select count(distinct(unique_ID)) from (
(select distinct(unique_ID) from table_name where Date = '2018_09_08') a
union all
(select distinct(unique_ID) from table_name where Date = '2018_09_18') b
union all
(select distinct(unique_ID) from table_name where Date = '2018_09_28') c
);

and I am getting following parsing error : FAILED: ParseException line 3:0 missing ) at 'union' near ')' line 4:87 missing EOF at 'b' near ')'

How could we get the Unique_IDs in this case ?

Upvotes: 0

Views: 226

Answers (1)

Vamsi Prabhala
Vamsi Prabhala

Reputation: 49260

This can be accomplished with group by and having.

select unique_id,count(distinct date)
from tbl
where date in ('2018_09_08','2018_09_18','2018_09_28')
group by id 
having count(distinct date) = 3

Upvotes: 2

Related Questions