HappyPy
HappyPy

Reputation: 10697

split dataframe values into a specified number of groups and apply function - pandas

df=pd.DataFrame([1,4,1,3,2,8,3,6,3,7,3,1,2,9])

I'd like to split df into a specified number of groups and sum all elements in each group. For example, dividing df into 4 groups

1,4,1,3  2,8,3,6  3,7,3,1  2,9 

would result in

9
19
14
11

I could do df.groupby(np.arange(len(df))//4).sum(), but this won't work for larger dataframes

For example

df1=pd.DataFrame([1,4,1,3,2,8,3,6,3,7,3,1,2,9,1,5,3,4])
df1.groupby(np.arange(len(df1))//4).sum()

creates 5 groups instead of 4

Upvotes: 3

Views: 329

Answers (3)

jezrael
jezrael

Reputation: 863291

You can use numpy.array_split:

df=pd.DataFrame([1,4,1,3,2,8,3,6,3,7,3,1,2,9,1,5,3,4])

a = pd.Series([x.values.sum() for x in np.array_split(df, 4)])
print (a)
0    11
1    27
2    15
3    13
dtype: int64

Solution with concat and sum:

a = pd.concat(np.array_split(df, 4), keys=np.arange(4)).sum(level=0)
print (a)
    0
0  11
1  27
2  15
3  13

Upvotes: 3

Chiheb Nexus
Chiheb Nexus

Reputation: 9267

I looked in the comments, and i thought that you can use some explicit python code when the "usual" pandas functions can't fulfill your needs.

So:

import pandas as pd

def get_sum(a, chunks):
    for k in range(0, len(df), chunks):
        yield a[k:k+chunks].values.sum()

df = pd.DataFrame([1,4,1,3,2,8,3,6,3,7,3,1,2,9])

group_size = list(get_sum(df, 4))
print(group_size)

Output:

[9, 19, 14, 11]

Upvotes: 0

Carles Mitjans
Carles Mitjans

Reputation: 4866

Say you have this data frame:

df = pd.DataFrame([1,4,1,3,2,8,3,6,3,7,3,1,2,9])

You can achive it using list comprehension and loc:

group_size = 4
[df.loc[i:i+group_size-1].values.sum() for i in range(0, len(df), group_size)]

Output:

[9, 19, 14, 11]

Upvotes: 0

Related Questions