Reputation: 417
I want to calculate the rolling average looking forward on "4 day window" basis. Please find the details below
Create table stock(day int, time String, cost float);
Insert into stock values(1,"8 AM",3.1);
Insert into stock values(1,"9 AM",3.2);
Insert into stock values(1,"10 AM",4.5);
Insert into stock values(1,"11 AM",5.5);
Insert into stock values(2,"8 AM",5.1);
Insert into stock values(2,"9 AM",2.2);
Insert into stock values(2,"10 AM",1.5);
Insert into stock values(2,"11 AM",6.5);
Insert into stock values(3,"8 AM",8.1);
Insert into stock values(3,"9 AM",3.2);
Insert into stock values(3,"10 AM",2.5);
Insert into stock values(3,"11 AM",4.5);
Insert into stock values(4,"8 AM",3.1);
Insert into stock values(4,"9 AM",1.2);
Insert into stock values(4,"10 AM",0.5);
Insert into stock values(4,"11 AM",1.5);
I wrote the below query
select day, cost,sum(cost) over (order by day range between current row and 4 Following), avg(cost) over (order by day range between current row and 4 Following)
from stock
As you can see, I get 4 records for each day and I need to calculate rolling average on 4 day window. For this, I wrote the above window query, as I have the data only for 4 days each day containing 4 records, the sum for the first day will be the total of all the 16 records. Based on this the first record will have the sum of 56.20 which is correct and the average should be 56.20/4 (as there are 4 days), but it is doing 56.20/16 as there 16 records in total. How do I fix the average part of this?
Thanks Raj
Upvotes: 1
Views: 491
Reputation: 1269633
Is this what you want?
select t.*,
avg(cost) over (order by day range between current row and 4 following)
from t;
EDIT:
What you seem to want is:
select t.*,
(sum(cost) over (order by day range between current row and 3 following) /
count(distinct day) over (order by day range between current row and 3 following)
)
from t;
But, Hive does not support this. You can use a subquery for this purpose:
select t.*,
(sum(cost) over (order by day range between current row and 3 following) /
sum(case when seqnum = 1 then 1 else 0 end) over (order by day range between current row and 3 following)
)
from (select t.*
row_number() over (partition by day order by time) as seqnum
from t
)t
Upvotes: 1