Reputation: 66
I have a requirement where-in i need to do data summarization on the date range provided as input. To be more specific: If my data looks like:
Input:
Id|amount|date
1 |10 |2016-01-01
2 |20 |2016-01-02
3 |20 |2016-01-03
4 |20 |2016-09-25
5 |20 |2016-09-26
6 |20 |2016-09-28
And If I want the summarization for the month of September, then I need to calculate count of records on 4 ranges which are:
So My output should have one record with 4 Columns for each day of the month(in this case, Month is September), Something like
Output:
Current_Date|Current_date_count|Week_To_Date_Count|Month_to_date_Count|Year_to_date_count
2016-09-25 |1 |1 |1 |4
2016-09-26 |1 |2 |3 |5
2016-09-28 |1 |3 |3 |6
Important: i can pass only 2 variables, which is range start date and range end date. Rest calculation need to be dynamic.
Thanks in advance
Upvotes: 2
Views: 185
Reputation: 1491
You can join on year, then test each condition separately (using sum(if())
):
select a.date, sum(if(a.date=b.date,1,0)),
sum(if(month(a.date)=month(b.date) and weekofyear(a.date)=weekofyear(b.date),1,0)),
sum(if(month(a.date)=month(b.date),1,0)),
count(*) from
(select * from input_table where date >= ${hiveconf:start} and date <${hiveconf:end}) a,
(select * from input_table where date <${hiveconf:end}) b
where year(a.date)=year(b.date) and b.date <= a.date group by a.date;
Upvotes: 1