Reputation: 7526
I'd like to find a general solution to groupby a DataFrame by a specified amount of rows or columns. Example DataFrame:
df = pd.DataFrame(0, index=['a', 'b', 'c', 'd', 'e', 'f'], columns=['c1', 'c2', 'c3', 'c4', 'c5', 'c6', 'c7'])
c1 c2 c3 c4 c5 c6 c7
a 0 0 0 0 0 0 0
b 0 0 0 0 0 0 0
c 0 0 0 0 0 0 0
d 0 0 0 0 0 0 0
e 0 0 0 0 0 0 0
f 0 0 0 0 0 0 0
For example I'd like to group by 2 rows a time and apply a function like mean or similar. I'd also like to know how to group by N columns a time and apply a function.
Group by 2 rows a time expected output:
c1 c2 c3 c4 c5 c6 c7
0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0
Group by 2 columns a time expected output:
0 1 2 3
a 0 0 0 0
b 0 0 0 0
c 0 0 0 0
d 0 0 0 0
e 0 0 0 0
f 0 0 0 0
Upvotes: 6
Views: 6662
Reputation: 7526
This groups by N rows
>>> N=2
>>> df.groupby(np.arange(len(df.index))//N, axis=0).mean()
c1 c2 c3 c4 c5 c6 c7
0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0
This groups by N columns
>>> df.groupby(np.arange(len(df.columns))//N, axis=1).mean()
0 1 2 3
a 0 0 0 0
b 0 0 0 0
c 0 0 0 0
d 0 0 0 0
e 0 0 0 0
f 0 0 0 0
Upvotes: 11