pandas rolling max with groupby

Question

I have a problem getting the rolling function of Pandas to do what I wish. I want for each frow to calculate the maximum so far within the group. Here is an example:

df = pd.DataFrame([[1,3], [1,6], [1,3], [2,2], [2,1]], columns=['id', 'value'])

looks like

   id  value
0   1      3
1   1      6
2   1      3
3   2      2
4   2      1

Now I wish to obtain the following DataFrame:

   id  value
0   1      3
1   1      6
2   1      6
3   2      2
4   2      2

The problem is that when I do

df.groupby('id')['value'].rolling(1).max()

I get the same DataFrame back. And when I do

df.groupby('id')['value'].rolling(3).max()

I get a DataFrame with Nans. Can someone explain how to properly use rolling or some other Pandas function to obtain the DataFrame I want?

MaxU - stand with Ukraine · Accepted Answer

It looks like you need cummax() instead of .rolling(N).max()

In [29]: df['new'] = df.groupby('id').value.cummax()

In [30]: df
Out[30]:
   id  value  new
0   1      3    3
1   1      6    6
2   1      3    6
3   2      2    2
4   2      1    2

Timing (using brand new Pandas version 0.20.1):

In [3]: df = pd.concat([df] * 10**4, ignore_index=True)

In [4]: df.shape
Out[4]: (50000, 2)

In [5]: %timeit df.groupby('id').value.apply(lambda x: x.cummax())
100 loops, best of 3: 15.8 ms per loop

In [6]: %timeit df.groupby('id').value.cummax()
100 loops, best of 3: 4.09 ms per loop

NOTE: from Pandas 0.20.0 what's new

Improved performance of groupby().cummin() and groupby().cummax() (GH15048, GH15109, GH15561, GH15635)

pandas rolling max with groupby

Answers (2)

Related Questions