Reputation: 1195
I managed to add a column to my pandas dataframe with internal numbering with respect to groups.
This is the input dataframe:
df = pd.DataFrame({
'name': ['Name1','Name2','Name3','Name4','Name5','Name6', 'Name7', 'Name8'],
'group':['GroupB','GroupB','GroupB','GroupA','GroupA','GroupA', 'GroupC', 'GroupC'],
'revenue':[1,2,3,4,5,6,11,22]}
)
That looks like that:
group name revenue
0 GroupB Name1 1
1 GroupB Name2 2
2 GroupB Name3 3
3 GroupA Name4 4
4 GroupA Name5 5
5 GroupA Name6 6
6 GroupC Name7 11
7 GroupC Name8 22
I want an output like this one
group name revenue Group_internal_id
0 GroupA Name4 4 0
1 GroupA Name5 5 1
2 GroupA Name6 6 2
3 GroupB Name1 1 0
4 GroupB Name2 2 1
5 GroupB Name3 3 2
6 GroupC Name7 11 0
7 GroupC Name8 22 1
I managed to get the output I wanted in dataframe outDF with the following code:
numbering_function = lambda x: range(len(x.index))
outDF = pd.DataFrame()
for iK, iDF in df.groupby('group'):
tempDF = iDF.copy()
tempDF['Group_internal_id'] = numbering_function(tempDF)
outDF = outDF.append(tempDF, ignore_index=True)
Then outDF looks as follow:
group name revenue Group_internal_id
0 GroupA Name4 4 0
1 GroupA Name5 5 1
2 GroupA Name6 6 2
3 GroupB Name1 1 0
4 GroupB Name2 2 1
5 GroupB Name3 3 2
6 GroupC Name7 11 0
7 GroupC Name8 22 1
I would like to find a way to obtain the same output dataframe without using a loop.
Thanks!
Upvotes: 2
Views: 40
Reputation: 862681
You need cumcount
with sort_values
:
df['new'] = df.groupby('group').cumcount()
df = df.sort_values('group')
print (df)
group name revenue new
3 GroupA Name4 4 0
4 GroupA Name5 5 1
5 GroupA Name6 6 2
0 GroupB Name1 1 0
1 GroupB Name2 2 1
2 GroupB Name3 3 2
6 GroupC Name7 11 0
7 GroupC Name8 22 1
Upvotes: 2