Grouping/Summarising and summing a data-frame based on another column

Question

I have a dataframe (df) created from a numpy array that looks like:

I want to create a summary data-frame that sums values in the first column relative to its position in the second column. So my desired summary data-frame output from the above example would look like:

3.68    total
2.79    total
3.83    total

Where: the first value in the summary data-frame would be equal to: 0.22+0.48+0.9+0.65+0.75+0.68=3.68

the second value in the summary data-frame would be equal to: 0.31+0.2+0.71+0.16+0.87+0.54=2.79

the third value in the summary data-frame would be equal to: 0.91+0.09+0.73+0.9+0.72+0.48=3.83

Quang Hoang · Accepted Answer

You can do groupby twice, one to label the relative position within each group, one to sum:

df[0].groupby(df.groupby(df[1]).cumcount()).sum()

Output:

0    3.68
1    2.79
2    3.83
Name: 0, dtype: float64

Option 2: If all groups have equal number of elements, we can just reshape:

df[0].values.reshape(df[1].max(),-1).sum(0)
# out
# array([3.68, 2.79, 3.83])

Grouping/Summarising and summing a data-frame based on another column

Answers (2)

Related Questions