Stacey
Stacey

Reputation: 5107

Grouping/Summarising and summing a data-frame based on another column

I have a dataframe (df) created from a numpy array that looks like:

0.22    1
0.31    1
0.91    1
0.48    2
0.2     2
0.09    2
0.9     3
0.71    3
0.73    3
0.65    4
0.16    4
0.9     4
0.75    5
0.87    5
0.72    5
0.68    6
0.54    6
0.48    6

I want to create a summary data-frame that sums values in the first column relative to its position in the second column. So my desired summary data-frame output from the above example would look like:

3.68    total
2.79    total
3.83    total

Where: the first value in the summary data-frame would be equal to: 0.22+0.48+0.9+0.65+0.75+0.68=3.68

the second value in the summary data-frame would be equal to: 0.31+0.2+0.71+0.16+0.87+0.54=2.79

the third value in the summary data-frame would be equal to: 0.91+0.09+0.73+0.9+0.72+0.48=3.83

Upvotes: 1

Views: 46

Answers (2)

Quang Hoang
Quang Hoang

Reputation: 150755

You can do groupby twice, one to label the relative position within each group, one to sum:

df[0].groupby(df.groupby(df[1]).cumcount()).sum()

Output:

0    3.68
1    2.79
2    3.83
Name: 0, dtype: float64

Option 2: If all groups have equal number of elements, we can just reshape:

df[0].values.reshape(df[1].max(),-1).sum(0)
# out
# array([3.68, 2.79, 3.83])

Upvotes: 2

Michael Szczesny
Michael Szczesny

Reputation: 5036

For problems like this the following pattern is often useful. To group by every n-th row, you can use df.index % n if your index is sorted and consecutive.

n = 3
df.groupby(df.index % n)[0].sum()

Output

0    3.68
1    2.79
2    3.83

Upvotes: 0

Related Questions