Add column with numbering of elements with respect to a groupby operation without loops

Question

I managed to add a column to my pandas dataframe with internal numbering with respect to groups.

This is the input dataframe:

df = pd.DataFrame({
     'name': ['Name1','Name2','Name3','Name4','Name5','Name6', 'Name7', 'Name8'], 
     'group':['GroupB','GroupB','GroupB','GroupA','GroupA','GroupA', 'GroupC', 'GroupC'],
     'revenue':[1,2,3,4,5,6,11,22]}
)

That looks like that:

    group   name    revenue
0   GroupB  Name1   1
1   GroupB  Name2   2
2   GroupB  Name3   3
3   GroupA  Name4   4
4   GroupA  Name5   5
5   GroupA  Name6   6
6   GroupC  Name7   11
7   GroupC  Name8   22

I want an output like this one

    group   name    revenue Group_internal_id
0   GroupA  Name4   4   0
1   GroupA  Name5   5   1
2   GroupA  Name6   6   2
3   GroupB  Name1   1   0
4   GroupB  Name2   2   1
5   GroupB  Name3   3   2
6   GroupC  Name7   11  0
7   GroupC  Name8   22  1

I managed to get the output I wanted in dataframe outDF with the following code:

numbering_function = lambda x: range(len(x.index))

outDF = pd.DataFrame()
for iK, iDF in df.groupby('group'):
    tempDF = iDF.copy()
    tempDF['Group_internal_id'] = numbering_function(tempDF)
    outDF = outDF.append(tempDF, ignore_index=True)

Then outDF looks as follow:

group   name    revenue Group_internal_id
0   GroupA  Name4   4   0
1   GroupA  Name5   5   1
2   GroupA  Name6   6   2
3   GroupB  Name1   1   0
4   GroupB  Name2   2   1
5   GroupB  Name3   3   2
6   GroupC  Name7   11  0
7   GroupC  Name8   22  1

I would like to find a way to obtain the same output dataframe without using a loop.

Thanks!

jezrael · Accepted Answer

You need cumcount with sort_values:

df['new'] = df.groupby('group').cumcount()
df = df.sort_values('group')
print (df)
    group   name  revenue  new
3  GroupA  Name4        4    0
4  GroupA  Name5        5    1
5  GroupA  Name6        6    2
0  GroupB  Name1        1    0
1  GroupB  Name2        2    1
2  GroupB  Name3        3    2
6  GroupC  Name7       11    0
7  GroupC  Name8       22    1

Add column with numbering of elements with respect to a groupby operation without loops

Answers (1)

Related Questions