Khalil Al Hooti
Khalil Al Hooti

Reputation: 4506

pandas groupby does not preserve order?

I have the following dataset in pandas:

import pandas as pd

seq = [1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2]
event_no = [5, 5, 5, 6, 6, 6, 4, 4, 4, 3, 3, 3, 1, 1, 1, 2, 2, 2]
points_no = [1, 1, 1, None, None, None, 1, 1, 1, 1, 1, 1, None, None, None, 1, 1, 1]

df = pd.DataFrame({"seq" : seq, "event_no": event_no, "points_no": points_no})

seq event_no    points_no
    0   1   5   1.0
    1   1   5   1.0
    2   1   5   1.0
    3   1   6   NaN
    4   1   6   NaN
    5   1   6   NaN
    6   1   4   1.0
    7   1   4   1.0
    8   1   4   1.0
    9   2   3   1.0
    10  2   3   1.0
    11  2   3   1.0
    12  2   1   NaN
    13  2   1   NaN
    14  2   1   NaN
    15  2   2   1.0
    16  2   2   1.0
    17  2   2   1.0

I group it by seq then event_no and then sum over points_no...

df2 = df.groupby(['seq', 'event_no']).points_no.sum().reset_index()

The output below doesn't preserve the original index order of data in column event_no, instead they're sorted in ascending order:

seq event_no    points_no
0   1   4   3.0
1   1   5   3.0
2   1   6   0.0
3   2   1   0.0
4   2   2   3.0
5   2   3   3.0

What I actually want is this output:

seq event_no    points_no
0   1   5   3.0
1   1   6   0.0
2   1   4   3.0
3   2   3   3.0
4   2   1   0.0
5   2   2   3.0

Is there a way to get the result as stated while preserving index order?

Upvotes: 6

Views: 4813

Answers (1)

rudolfovic
rudolfovic

Reputation: 3276

Use the parameter sort=False:

df.groupby(['seq', 'event_no'], sort=False).points_no.sum().reset_index()

Upvotes: 7

Related Questions