Reputation: 115
I am working on a data set which has the following columns :
unique_ID Date
a 2018_09_08
a 2018_09_18
a 2018_09_28
d 2018_09_08
I am looking to select those Unique_IDs which are occurring on all three dates i.e 2018_09_08, 2018_09_18 and 2018_09_28.
My output should be just 'a'.
There is a long solution to this problem - Extract unique_IDs per date and create external table on top of all three of them and then use join on three tables to get unique IDs for all three dates. I believe there should be a better solution as we have just 3 dates in this case which might rise later so I am looking for a more generalized solution.
Here is the query that I have written - select distinct(unique_ID) from table_name where Date = '2018_09_08' and Date = '2018_09_18' and Date = '2018_09_28'
which is returning null.
I am also trying to write a sub-query but I doubt HIVE supports such sub queries in this case. Here is what I have written :
select count(distinct(unique_ID)) from (
(select distinct(unique_ID) from table_name where Date = '2018_09_08') a
union all
(select distinct(unique_ID) from table_name where Date = '2018_09_18') b
union all
(select distinct(unique_ID) from table_name where Date = '2018_09_28') c
);
and I am getting following parsing error : FAILED: ParseException line 3:0 missing ) at 'union' near ')' line 4:87 missing EOF at 'b' near ')'
How could we get the Unique_IDs in this case ?
Upvotes: 0
Views: 226
Reputation: 49260
This can be accomplished with group by
and having
.
select unique_id,count(distinct date)
from tbl
where date in ('2018_09_08','2018_09_18','2018_09_28')
group by id
having count(distinct date) = 3
Upvotes: 2