Reputation: 729
I am using pandas package to deal with my data, and I have a dataframe looks like below.
data = pd.read_csv('people.csv')
id, A, B
John, 1, 3
Mary, 2, 5
John, 4, 6
John, 3, 7
Mary, 5, 2
I'd like to produce the unique id for those duplicates but keep the same order of them.
id, A, B
John, 1, 3
Mary, 2, 5
John.1, 4, 6
John.2, 3, 7 # John shows up three times.
Mary.1, 5, 2 # Mary shows up twice.
I tried something like set_index
, pd.factorize()
and index_col
but they do not work.
Upvotes: 1
Views: 555
Reputation: 77951
In order to obtain the indices you may use GroupBy.cumcount
:
>>> idx = df.groupby('id').cumcount()
>>> idx
0 0
1 0
2 1
3 2
4 1
dtype: int64
The non-zero ones may be appended by:
>>> mask = idx != 0
>>> df.loc[mask, 'id'] += '.' + idx[mask].astype('str')
>>> df
id A B
0 John 1 3
1 Mary 2 5
2 John.1 4 6
3 John.2 3 7
4 Mary.1 5 2
Upvotes: 2