KHURRAM
KHURRAM

Reputation: 19

want to assign id to duplicate rows

id    name     age      year

0     khu       12      2018

1     she       21      2019

2     waqar     22      2015

3     khu       12      2018

4     she       21      2018

5     waqar     22      2015

want like this

id    name     age      year

0     khu       12      2018

1     she       21      2019

2     waqar     22      2015

0     khu       12      2018

1     she       21      2018

2     waqar     22      2015

Upvotes: 1

Views: 605

Answers (2)

BENY
BENY

Reputation: 323226

Using factorize as well you can check with category and cat.codes, or sklearn LabelEncoder

df['id']=pd.factorize(df['name'])[0]
df
Out[470]: 
   id   name  age  year
0   0    khu   12  2018
1   1    she   21  2019
2   2  waqar   22  2015
3   0    khu   12  2018
4   1    she   21  2018
5   2  waqar   22  2015

Upvotes: 3

jezrael
jezrael

Reputation: 862541

Use GroupBy.ngroup:

df['id'] = df.groupby('name', sort=False).ngroup()
#if need grouping by multiple columns for check duplicates
#df['id'] = df.groupby(['name','age'], sort=False).ngroup()
print (df)
   id   name  age  year
0   0    khu   12  2018
1   1    she   21  2019
2   2  waqar   22  2015
3   0    khu   12  2018
4   1    she   21  2018
5   2  waqar   22  2015

Upvotes: 5

Related Questions