VijZ
VijZ

Reputation: 69

hive query with counts

We have a Hive Table like below:

Hive Table

We would like to see output like below:

Output

For each date, display the counts of customer who bought each fruit. if no one bought a fruit for that day, replace it with 0 in place of NULL. any Apache Query to achieve this is highly appreciated

Update: There will only be four products Apples, Bananas, Grapes and Oranges. we would not want to find distinct dates and distinct products.

Upvotes: 1

Views: 11441

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1271141

Use cross join to generate the rows and then left join and group by for the calcualtion:

select d.date, p.product, count(t.date) 
from (select distinct date from t) d cross join
     (select distinct product from t) p left join
     t
     on t.date = d.date and t.product = p.product
group by d.date, p.product
order by d.date, p.product;

The hint that you need cross join is that some of the returned counts are 0, so data combinations not in the original data need to be "generated".

Upvotes: 0

GMB
GMB

Reputation: 222702

If your RDMS supports COUNT(DISTINCT ...), that's a simple aggregate query:

SELECT mydate, product, COUNT(DISTINCT customer)
FROM mytable
GROUP BY mydate, product

PS : it is usually not a good idea to name a column date, as this conflicts with the homonym sql datatype. I changed it to mydate in the queries.


Without COUNT(DISTINCT ...), you would need to use a subquery:

SELECT mydate, product, COUNT(*)
FROM (SELECT DISTINCT mydate, product, customer FROM mytable) x
GROUP BY mydate, product

Upvotes: 1

Related Questions