Reputation: 97
Have the following dataframe:
Sometimes the index is duplicate and then I want to change the value in the column 'Hotspot'. So prograding_feature_polygon_30 should be changed to prograding_feature_polygon_30_1 and the second on index 0 to prograding_feature_polygon_30_2.
The same on index 1, so again the values should be changed to prograding_feature_polygon_30_1 and prograding_feature_polygon_30_2. And so on...
Indexes are not always duplicate and if not, the value in Hotspot should remain the same. Anyone knows an easy way to do this?
Regards,
Dante
Upvotes: 0
Views: 24
Reputation: 14369
Data sample
import pandas as pd
import numpy as np
df = pd.DataFrame({'a': np.repeat([*'ABCD'],[2,1,3,1]),
'b': [*range(7)]},
index=np.repeat([*range(4)],[2,1,3,1]))
print(df)
a b
0 A 0
0 A 1
1 B 2
2 C 3
2 C 4
2 C 5
3 D 6
Problem
For each duplicate in the index, we want to add a consecutive number to the values in column a
. So, A_1, A_2
for index value 0
, and C_1, C_2, C_3
for index value 2
. Values without duplicates (1
and 3
) should be unaffected.
Solution
df.a = np.where(df.index.duplicated(keep=False),
df.a + '_' + df.groupby(level=0).cumcount().add(1).astype(str),
df.a)
print(df)
a b
0 A_1 0
0 A_2 1
1 B 2
2 C_1 3
2 C_2 4
2 C_3 5
3 D 6
Explanation
df.index.duplicated
with param keep=False
to get an array with True
for duplicates, False
for non-duplicates.np.where
. If True
, we want df.a + consecutive number
, else simply df.a
.df.groupby
on the index, and apply .cumcount
to enumerate items per group. add(1)
to start at 1
, instead of 0
. Finally, use astype(str)
, in view of the concatenation with df.a
.Upvotes: 1