Reputation: 319
I have a DataFrame with a column that contains words. I want to have numbers that represent those values, for example in another column.
In the next example, I have 'col_1' and 'col_2'. Then I want to get the 'col_3'
'col_1' | 'col_2' | 'col_3'
---------------------------
0 | a | 0
1 | a | 0
2 | b | 1
3 | c | 2
4 | b | 1
Upvotes: 2
Views: 539
Reputation: 153480
Another way to do this is to use dtype 'category' and 'codes' attribute:
df['col_3'] = df['col_2'].astype('category').cat.codes
Output:
col_1 col_2 col_3
0 0 a 0
1 1 a 0
2 2 b 1
3 3 c 2
4 4 b 1
Upvotes: 2
Reputation: 30920
IIUC, you want groupby.ngroup
:
df['col_3']=df.groupby('col_2').ngroup()
print(df)
col_1 col_2 col_3
0 0 a 0
1 1 a 0
2 2 b 1
3 3 c 2
4 4 b 1
Upvotes: 5
Reputation: 25259
Try factorize
df['col_3'] = df.col_2.factorize()[0]
Out[1641]:
col_1 col_2 col_3
0 0 a 0
1 1 a 0
2 2 b 1
3 3 c 2
4 4 b 1
Upvotes: 6