Can't Tell
Can't Tell

Reputation: 13416

Rolling Sum Over Date index

I have the following code

data = {'date': ['2014-05-05', '2014-05-01', '2014-05-02',
                 '2014-05-02', '2014-05-02', '2014-05-02',
                 '2014-05-03', '2014-05-03', '2014-05-04',
                 '2014-05-04'],
        'battle_deaths': [34, 25, 26, 15, 15, 14, 26, 25, 62, 41],
        'group': [1, 2, 3, 2, 2, 3, 2, 1, 3, 1]}
df = pd.DataFrame(data, columns=['date', 'battle_deaths', 'group' ''])

df = df.set_index('date')
df.groupby('group').rolling(2).sum()

Which outputs

                 battle_deaths  group
group date                            
1     2014-05-05            NaN    NaN
      2014-05-03           59.0    2.0
      2014-05-04           66.0    2.0
2     2014-05-01            NaN    NaN
      2014-05-02           40.0    4.0
      2014-05-02           30.0    4.0
      2014-05-03           41.0    4.0
3     2014-05-02            NaN    NaN
      2014-05-02           40.0    6.0
      2014-05-04           76.0    6.0

But since the index is the date, I would expect for the first group that the battle_deaths column should have NaN value for date 2014-05-03. In other words I would like to do the rolling sum within each group sorted by the date. How can I achieve this?

Upvotes: 1

Views: 121

Answers (2)

jezrael
jezrael

Reputation: 862511

I think need convert dates to datetimes and sorting by sort_values:

df['date'] = pd.to_datetime(df['date'])
df = df.sort_values('date').set_index('date')
df = df.groupby('group').rolling(2).sum()

                  battle_deaths  group
group date                            
1     2014-05-03            NaN    NaN
      2014-05-04           66.0    2.0
      2014-05-05           75.0    2.0
2     2014-05-01            NaN    NaN
      2014-05-02           40.0    4.0
      2014-05-02           30.0    4.0
      2014-05-03           41.0    4.0
3     2014-05-02            NaN    NaN
      2014-05-02           40.0    6.0
      2014-05-04           76.0    6.0

Upvotes: 1

jeevs
jeevs

Reputation: 131

That dataframe needs to be sorted, here is how i would do it.

df.sort_index().groupby('group').rolling(2).sum()

Upvotes: 3

Related Questions