Reputation: 69
We have a Hive Table like below:
We would like to see output like below:
For each date, display the counts of customer who bought each fruit. if no one bought a fruit for that day, replace it with 0 in place of NULL. any Apache Query to achieve this is highly appreciated
Update: There will only be four products Apples, Bananas, Grapes and Oranges. we would not want to find distinct dates and distinct products.
Upvotes: 1
Views: 11441
Reputation: 1271141
Use cross join
to generate the rows and then left join
and group by
for the calcualtion:
select d.date, p.product, count(t.date)
from (select distinct date from t) d cross join
(select distinct product from t) p left join
t
on t.date = d.date and t.product = p.product
group by d.date, p.product
order by d.date, p.product;
The hint that you need cross join
is that some of the returned counts are 0
, so data combinations not in the original data need to be "generated".
Upvotes: 0
Reputation: 222702
If your RDMS supports COUNT(DISTINCT ...)
, that's a simple aggregate query:
SELECT mydate, product, COUNT(DISTINCT customer)
FROM mytable
GROUP BY mydate, product
PS : it is usually not a good idea to name a column date
, as this conflicts with the homonym sql datatype. I changed it to mydate
in the queries.
Without COUNT(DISTINCT ...)
, you would need to use a subquery:
SELECT mydate, product, COUNT(*)
FROM (SELECT DISTINCT mydate, product, customer FROM mytable) x
GROUP BY mydate, product
Upvotes: 1