user2293224
user2293224

Reputation: 2220

Python pandas: rolling sum does not produce 12 month sum

I have dataframe which looks like:

df

ID      RY    Month INCIDENT_NO
AB10    2019    7   3
AB10    2019    12  3
AB1286  2011    5   2
AB1286  2012    1   2
AB1286  2014    4   1

I am trying to create a rolling sum of INCIDENT_NO of 12 months based on ID. I am writing code like follow:

df.groupby('ID').rolling(12,on='RY').INCIDENT_NO.sum()

It produces the following output:

  ID         RY  
    AB10      2019     NaN
              2019     NaN
    AB1286    2011     NaN
              2012     NaN
              2014     NaN
                      ... 
    WS7       2020    30.0
    WS8       2016     NaN
    WS9       2014     NaN
              2018     NaN
              2019     NaN

However, it did not produce the desired result. For example, in ID AB10, 2019 RY appeared twice, it should appear once. Could anyone point me where I made the mistake? Moreover, I want to show the rolling sum of each year (even if the data is not present). For example, ID ws9 does not have any data of RY 2020, it should shows sum zero.

Any help would be really appreciated.

Upvotes: 0

Views: 693

Answers (3)

David Erickson
David Erickson

Reputation: 16683

If there are years and months missing from your data, then you can do reindex the data and create a pivot_table.

df = df.groupby(["ID", "RY", "Month"])['INCIDENT_NO'].sum().reset_index()
idx = pd.MultiIndex.from_product([df['ID'].unique(),
                              range(df['RY'].min(), df['RY'].max()+1),
                              range(df['Month'].min(), df['Month'].max()+1)],
                             names=['ID', 'RY', 'Month'])
(df.set_index(["ID", "RY", "Month"]).reindex(idx).sort_index().fillna(0).reset_index()
 .pivot_table(index='ID', values='INCIDENT_NO', columns='RY', aggfunc='sum'))

Out[205]: 
RY      2011  2012  2013  2014  2015  2016  2017  2018  2019
ID                                                          
AB10     0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   6.0
AB1286   2.0   2.0   0.0   1.0   0.0   0.0   0.0   0.0   0.0

Upvotes: 1

Iqbal Basyar
Iqbal Basyar

Reputation: 167

I think you should first make sure that the data is in correct ascending order for each month on each ID by grouping 'ID', 'RY', and 'month' (Each row represent each month), then just add rolling(12)

df.groupby(['ID', 'RY', 'month']).rolling(12).sum()

Upvotes: 1

Space Impact
Space Impact

Reputation: 13255

I think you need groupby and sum:

df.groupby(['ID','RY'], as_index=False)['INCIDENT_NO'].sum().fillna(0, downcast='infer')

Upvotes: 1

Related Questions