Reputation: 77
Let's say I have the following DataFrame:
df = pd.DataFrame({"my_col": ["one","two","two","one","two","one","one"]})
my_col
0 one
1 two
2 two
3 one
4 two
5 one
6 one
I would like append a string on the duplicated values with their duplicate count. Here is what I mean:
my_col
0 one_0
1 two_0
2 two_1
3 one_1
4 two_2
5 one_2
6 one_3
I know I could do something like df.groupby('my_col').apply(my_function_to_do_this) with something like this :
def my_function_to_do_this(group: pd.DataFrame) -> pd.DataFrame:
str_to_append = pd.Series(range(group.shape[0]), index=group.index).astype(str)
group["my_col"] += "_" + str_to_append
return group
but that's quite slow on a large DataFrame with a lot of small groups of like 4 rows maximum.
I'm trying to find a faster approach if any.
Many thanks in advance for the help !
Upvotes: 3
Views: 129
Reputation: 862521
Use GroupBy.cumcount
for counter, convert to strings and add to original with Series.str.cat
:
df['my_col'] = df['my_col'].str.cat(df.groupby('my_col').cumcount().astype(str), sep='_')
print (df)
my_col
0 one_0
1 two_0
2 two_1
3 one_1
4 two_2
5 one_2
6 one_3
Or join by +
:
df['my_col'] += '_' + df.groupby('my_col').cumcount().astype(str)
#longer version
#df['my_col'] = df['my_col'] + '_' + df.groupby('my_col').cumcount().astype(str)
Upvotes: 6