user12845343
user12845343

Reputation: 21

Calculating max/min for every n rows of dataframe in python

I want to calculate the min/max for every n rows of a df, say 10, but using df.rolling(10).max() gives the values for rows 0-9, 1-10, 2-11 etc. I want 0-9, 10-19, 20-29 etc

Is there a neat way of doing this for a large dataset?

Thanks

Upvotes: 2

Views: 3591

Answers (3)

evedovelli
evedovelli

Reputation: 2135

Starting from pandas 1.5.0 you can use df.rolling(10, step=10).max().

From Pandas documentation:

step: int, default None

Evaluate the window at every step result, equivalent to slicing as [::step]. window must be an integer. Using a step argument other than None or 1 will produce a result with a different shape than the input.

Upvotes: 0

ansev
ansev

Reputation: 30920

Use np.arange to calculate an array of 0 to len (df) -1 and then calculate the entire divison. We can use this array together with DataFrame.groupby.

n=9
df.groupby(np.arange(len(df))//n).max()

Example

Setup

df=pd.DataFrame({'A':range(30)})
print(df)
     A
0    0
1    1
2    2
3    3
4    4
5    5
6    6
7    7
8    8
9    9
10  10
11  11
12  12
13  13
14  14
15  15
16  16
17  17
18  18
19  19
20  20
21  21
22  22
23  23
24  24
25  25
26  26
27  27
28  28
29  29

Solution

n=9
df_max = df.groupby(np.arange(len(df))//n).max()
print(df_max)
    A
0   8
1  17
2  26
3  29

As you can see, groups listed from 0 to m are created. The resulting DataFrame is the maximum of each group.


You could also select columns to calculate max

df.groupby(np.arange(len(df))//n)[cols].max()

and of course use groupby.agg if you want to use more than one function

Detail

np.arange(len(df))//n
#array([0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2,
#       2, 2, 2, 2, 2, 3, 3, 3])

EDIT

first_row = 2
df2 = df[first_row:]
df2.groupby(np.arange(len(df2))//n).max()

Upvotes: 3

Valdi_Bo
Valdi_Bo

Reputation: 30971

The key to success is grouping by np.arange(len(df.index)) // 10 (or change 10 (the group size) to whatever other value you need).

Assuming that you have 3 columns to compute min / max (['A', 'B', 'C']), you can run:

df.groupby(np.arange(len(df.index)) // 10)[['A', 'B', 'C']].agg(['min', 'max'])

Upvotes: 3

Related Questions