Get order of subgroups in pandas dataframe

Question

I have a pandas dataframe that looks something like this:

df = pd.DataFrame({'Name' : ['Kate', 'John', 'Peter','Kate', 'John', 'Peter'],'Distance' : [23,16,32,15,31,26], 'Time' : [3,5,2,7,9,4]})
df


  Distance  Name   Time
0   23      Kate    3
1   16      John    5
2   32      Peter   2
3   15      Kate    7
4   31      John    9
5   26      Peter   2

I want to add a column that tells me, for each Name, what's the order of the times.

I want something like this:

 Order  Distance    Name    Time
    0       16      John    5
    1       31      John    9
    0       23      Kate    3
    1       15      Kate    7
    0       32      Peter   2
    1       26      Peter   4

I can do it using a for loop:

df2 = df[df['Name'] == 'aaa'].reset_index().reset_index() # I did this just to create an empty data frame with the columns I want

for name, row in df.groupby('Name').count().iterrows():
    table = df[df['Name'] == name].sort_values('Time').reset_index().reset_index()
    to_concat = [df2,table]
    df2 = pd.concat(to_concat)

df2.drop('index', axis = 1, inplace = True)
df2.columns = ['Order', 'Distance', 'Name', 'Time']
df2

This works, the problem is (apart from being very unpythonic), for large tables (my actual table has about 50 thousand rows) it takes about half an hour to run.

Can someone help me write this in a simpler way that runs faster? I'm sorry if this has been answered somewhere, but I didn't really know how to search for it.

Best,

jezrael · Accepted Answer

Use sort_values with cumcount:

df = df.sort_values(['Name','Time'])
df['Order'] = df.groupby('Name').cumcount()
print (df)

   Distance   Name  Time  Order
1        16   John     5      0
4        31   John     9      1
0        23   Kate     3      0
3        15   Kate     7      1
2        32  Peter     2      0
5        26  Peter     4      1

If need first column use insert:

df = df.sort_values(['Name','Time'])
df.insert(0, 'Order', df.groupby('Name').cumcount())
print (df)
   Order  Distance   Name  Time
1      0        16   John     5
4      1        31   John     9
0      0        23   Kate     3
3      1        15   Kate     7
2      0        32  Peter     2
5      1        26  Peter     4

Get order of subgroups in pandas dataframe

Answers (2)

Related Questions