espogian
espogian

Reputation: 607

Rolling sum over a period with SQL/Impala

I have a table structured as the following:

DateDay     Id     Value    
20200801    A      7
20200803    A      1
20200803    B      3
20200804    C      30
...         ...    ...

Where DateDay is present only if Value > 0 And I have several possible Ids

What I would like to do is obtain a table with a rolling sum of Value over a period of 7 days, per each Id. But it should have all the days in it

DateDay     Id     Value    RollingSum    
20200801    A      7        7
20200801    B      0        0
20200801    C      0        0
20200802    A      0        7
20200802    B      0        0
20200802    C      0        0
20200803    A      1        8        
20200803    B      3        3
20200803    C      0        0
20200804    A      0        8
20200804    B      0        3
20200804    C      30       30
...         ...    ...      ...

Is there a simple way to accomplish this via SQL (I'm using Impala/Hive)?

Upvotes: 1

Views: 3124

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269623

You have two problems -- generating the rows and the cumulative sums. Let me assume that the table has all the time periods you want or you have another table with them. Then use a cross join to generate the rows and a cumulative sum for the results:

select d.dateday, i.id,
       sum(t.value) over (partition by i.id order by d.dateday) as running_sum
from (select distinct dateday from t) d cross join
     (select distinct id from t) i left join
     t
     on t.dateday = d.dateday and t.id = i.id
order by d.dateday, i.id;

If you have other tables with the days or ids, then use those instead of the subqueries.

Once you have all the days, you can include a window frame clause:

       sum(t.value) over (partition by i.id order by d.dateday rows between 6 preceding and current day) as running_sum

Upvotes: 2

Related Questions