Reputation: 5107
I have a dataframe (df
) created from a numpy array that looks like:
0.22 1
0.31 1
0.91 1
0.48 2
0.2 2
0.09 2
0.9 3
0.71 3
0.73 3
0.65 4
0.16 4
0.9 4
0.75 5
0.87 5
0.72 5
0.68 6
0.54 6
0.48 6
I want to create a summary data-frame that sums values in the first column relative to its position in the second column. So my desired summary data-frame output from the above example would look like:
3.68 total
2.79 total
3.83 total
Where:
the first value in the summary data-frame would be equal to: 0.22+0.48+0.9+0.65+0.75+0.68=3.68
the second value in the summary data-frame would be equal to: 0.31+0.2+0.71+0.16+0.87+0.54=2.79
the third value in the summary data-frame would be equal to:
0.91+0.09+0.73+0.9+0.72+0.48=3.83
Upvotes: 1
Views: 46
Reputation: 150755
You can do groupby twice, one to label the relative position within each group, one to sum:
df[0].groupby(df.groupby(df[1]).cumcount()).sum()
Output:
0 3.68
1 2.79
2 3.83
Name: 0, dtype: float64
Option 2: If all groups have equal number of elements, we can just reshape:
df[0].values.reshape(df[1].max(),-1).sum(0)
# out
# array([3.68, 2.79, 3.83])
Upvotes: 2
Reputation: 5036
For problems like this the following pattern is often useful. To group by every n-th row, you can use df.index % n
if your index is sorted and consecutive.
n = 3
df.groupby(df.index % n)[0].sum()
Output
0 3.68
1 2.79
2 3.83
Upvotes: 0