Reputation: 4506
I have the following dataset in pandas:
import pandas as pd
seq = [1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2]
event_no = [5, 5, 5, 6, 6, 6, 4, 4, 4, 3, 3, 3, 1, 1, 1, 2, 2, 2]
points_no = [1, 1, 1, None, None, None, 1, 1, 1, 1, 1, 1, None, None, None, 1, 1, 1]
df = pd.DataFrame({"seq" : seq, "event_no": event_no, "points_no": points_no})
seq event_no points_no
0 1 5 1.0
1 1 5 1.0
2 1 5 1.0
3 1 6 NaN
4 1 6 NaN
5 1 6 NaN
6 1 4 1.0
7 1 4 1.0
8 1 4 1.0
9 2 3 1.0
10 2 3 1.0
11 2 3 1.0
12 2 1 NaN
13 2 1 NaN
14 2 1 NaN
15 2 2 1.0
16 2 2 1.0
17 2 2 1.0
I group it by seq
then event_no
and then sum over points_no
...
df2 = df.groupby(['seq', 'event_no']).points_no.sum().reset_index()
The output below doesn't preserve the original index order of data in column event_no
, instead they're sorted in ascending order:
seq event_no points_no
0 1 4 3.0
1 1 5 3.0
2 1 6 0.0
3 2 1 0.0
4 2 2 3.0
5 2 3 3.0
What I actually want is this output:
seq event_no points_no
0 1 5 3.0
1 1 6 0.0
2 1 4 3.0
3 2 3 3.0
4 2 1 0.0
5 2 2 3.0
Is there a way to get the result as stated while preserving index order?
Upvotes: 6
Views: 4813
Reputation: 3276
Use the parameter sort=False
:
df.groupby(['seq', 'event_no'], sort=False).points_no.sum().reset_index()
Upvotes: 7