Joracosu
Joracosu

Reputation: 319

Create a new column with numbers in function of an unique string of other column

I have a DataFrame with a column that contains words. I want to have numbers that represent those values, for example in another column.

In the next example, I have 'col_1' and 'col_2'. Then I want to get the 'col_3'

'col_1' | 'col_2' | 'col_3'
---------------------------
  0     |  a      |  0
  1     |  a      |  0
  2     |  b      |  1
  3     |  c      |  2
  4     |  b      |  1

Upvotes: 2

Views: 539

Answers (3)

Scott Boston
Scott Boston

Reputation: 153480

Another way to do this is to use dtype 'category' and 'codes' attribute:

df['col_3'] = df['col_2'].astype('category').cat.codes

Output:

   col_1 col_2  col_3
0      0     a      0
1      1     a      0
2      2     b      1
3      3     c      2
4      4     b      1

Upvotes: 2

ansev
ansev

Reputation: 30920

IIUC, you want groupby.ngroup:

df['col_3']=df.groupby('col_2').ngroup()
print(df)

   col_1 col_2 col_3
0      0     a     0
1      1     a     0
2      2     b     1
3      3     c     2
4      4     b     1

Upvotes: 5

Andy L.
Andy L.

Reputation: 25259

Try factorize

df['col_3'] = df.col_2.factorize()[0]

Out[1641]:
   col_1 col_2  col_3
0  0      a     0
1  1      a     0
2  2      b     1
3  3      c     2
4  4      b     1

Upvotes: 6

Related Questions