Reputation: 181
I have a large dataframe with the column DOB
and ID
:
import pandas as pd
df = pd.read_csv('data.csv')
df.head()
ID DOB
223725 1975.0
223725 1975.0
223725 1975.0
223725 1975.0
223725 1975.0
There are 63 different years in DOB
. I want to change the values in this column so that each year is replaced by a simple number. For example, the lowest value or year 1911
is changed to a value of 1
, the 2nd lowest value in DOB
is replaced by 2, the 3rd lowest by 3
etc.
How do I make this change fast?
Upvotes: 0
Views: 33
Reputation: 862481
You can use Series.rank
:
df['DOB1'] = df['DOB'].rank(method='dense')
print (df)
ID DOB DOB1
0 223725 1911.0 1.0
1 223725 2000.0 3.0
2 223725 2006.0 4.0
3 223725 1985.0 2.0
4 223725 1911.0 1.0
Upvotes: 2