Reputation: 2220
I have dataframe which looks like:
df
ID RY Month INCIDENT_NO
AB10 2019 7 3
AB10 2019 12 3
AB1286 2011 5 2
AB1286 2012 1 2
AB1286 2014 4 1
I am trying to create a rolling sum of INCIDENT_NO of 12 months based on ID. I am writing code like follow:
df.groupby('ID').rolling(12,on='RY').INCIDENT_NO.sum()
It produces the following output:
ID RY
AB10 2019 NaN
2019 NaN
AB1286 2011 NaN
2012 NaN
2014 NaN
...
WS7 2020 30.0
WS8 2016 NaN
WS9 2014 NaN
2018 NaN
2019 NaN
However, it did not produce the desired result. For example, in ID AB10, 2019 RY appeared twice, it should appear once. Could anyone point me where I made the mistake? Moreover, I want to show the rolling sum of each year (even if the data is not present). For example, ID ws9 does not have any data of RY 2020, it should shows sum zero.
Any help would be really appreciated.
Upvotes: 0
Views: 693
Reputation: 16683
If there are years and months missing from your data, then you can do reindex
the data and create a pivot_table
.
df = df.groupby(["ID", "RY", "Month"])['INCIDENT_NO'].sum().reset_index()
idx = pd.MultiIndex.from_product([df['ID'].unique(),
range(df['RY'].min(), df['RY'].max()+1),
range(df['Month'].min(), df['Month'].max()+1)],
names=['ID', 'RY', 'Month'])
(df.set_index(["ID", "RY", "Month"]).reindex(idx).sort_index().fillna(0).reset_index()
.pivot_table(index='ID', values='INCIDENT_NO', columns='RY', aggfunc='sum'))
Out[205]:
RY 2011 2012 2013 2014 2015 2016 2017 2018 2019
ID
AB10 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 6.0
AB1286 2.0 2.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0
Upvotes: 1
Reputation: 167
I think you should first make sure that the data is in correct ascending order for each month on each ID by grouping 'ID', 'RY', and 'month' (Each row represent each month), then just add rolling(12)
df.groupby(['ID', 'RY', 'month']).rolling(12).sum()
Upvotes: 1
Reputation: 13255
I think you need groupby
and sum
:
df.groupby(['ID','RY'], as_index=False)['INCIDENT_NO'].sum().fillna(0, downcast='infer')
Upvotes: 1