Reputation: 21
I want to calculate the min/max for every n rows of a df, say 10, but using df.rolling(10).max() gives the values for rows 0-9, 1-10, 2-11 etc. I want 0-9, 10-19, 20-29 etc
Is there a neat way of doing this for a large dataset?
Thanks
Upvotes: 2
Views: 3591
Reputation: 2135
Starting from pandas 1.5.0 you can use df.rolling(10, step=10).max()
.
From Pandas documentation:
step: int, default None
Evaluate the window at every step result, equivalent to slicing as [::step]. window must be an integer. Using a step argument other than None or 1 will produce a result with a different shape than the input.
Upvotes: 0
Reputation: 30920
Use np.arange
to calculate an array of 0
to len (df) -1
and then calculate the entire divison. We can use this array together with DataFrame.groupby
.
n=9
df.groupby(np.arange(len(df))//n).max()
Setup
df=pd.DataFrame({'A':range(30)})
print(df)
A
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
Solution
n=9
df_max = df.groupby(np.arange(len(df))//n).max()
print(df_max)
A
0 8
1 17
2 26
3 29
As you can see, groups listed from 0 to m are created. The resulting DataFrame is the maximum of each group.
You could also select columns to calculate max
df.groupby(np.arange(len(df))//n)[cols].max()
and of course use groupby.agg
if you want to use more than one function
Detail
np.arange(len(df))//n
#array([0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2,
# 2, 2, 2, 2, 2, 3, 3, 3])
EDIT
first_row = 2
df2 = df[first_row:]
df2.groupby(np.arange(len(df2))//n).max()
Upvotes: 3
Reputation: 30971
The key to success is grouping by np.arange(len(df.index)) // 10 (or change 10 (the group size) to whatever other value you need).
Assuming that you have 3 columns to compute min / max (['A', 'B', 'C']), you can run:
df.groupby(np.arange(len(df.index)) // 10)[['A', 'B', 'C']].agg(['min', 'max'])
Upvotes: 3