Reputation: 36624
I have a pd.DataFrame
full of picture names. Often, the image names are repeated. But, they are always next to each other. This is what it looks like:
import pandas as pd
from numpy.random import randint
df = pd.DataFrame(sorted(['image_{}'.format(randint(4)) for i in range(10)]),
columns=['Image Name'])
print(df)
Out[6]:
Image Name
0 image_0
1 image_0
2 image_0
3 image_1
4 image_1
5 image_2
6 image_2
7 image_2
8 image_3
9 image_3
Because I will save the images based on this name, I want to append these strings with the cumulative count, as such:
Out[7]:
Image Name
0 image_0_1
1 image_0_2
2 image_0_3
3 image_1_1
4 image_1_2
5 image_2_1
6 image_2_2
7 image_2_3
8 image_3_1
9 image_3_1
How can I proceed? I'm guessing some combination of groupby
and cumcount
?
Upvotes: 2
Views: 351
Reputation: 109546
df['new_name'] = (
df
.groupby('Image Name')['Image Name']
.transform(lambda images: [image + f'_{n + 1}' for n, image in enumerate(images)])
)
>>> df
Image Name new_name
0 image_0 image_0_1
1 image_0 image_0_2
2 image_0 image_0_3
3 image_1 image_1_1
4 image_1 image_1_2
5 image_2 image_2_1
6 image_2 image_2_2
7 image_2 image_2_3
8 image_3 image_3_1
9 image_3 image_3_2
Upvotes: 3
Reputation: 107652
Consider groupby().cumcount()
and concatenate to original string and order does not matter:
df['Image Name'] = (df['Image Name'] + '_' +
(df.groupby('Image Name').cumcount() + 1).astype(str)
)
Upvotes: 5