Convert pandas series into integers

Question

Given a dataframe like this:

'John', 0.25
'Mary', 0.2
'Adam', 0.1
'Andrew', 0.6

I would like to generate a unique integer for every category in a certain series. For example, in the case above, the output could be something like this

0, 0.25
1, 0.2
2, 0.1
3, 0.6

possibly with pandas or standard libraries only.

jezrael · Accepted Answer

I think you can use factorize like:

print df
          a     b
0    'John'  0.25
1    'Mary'  0.20
2    'Mary'  0.20
3    'Adam'  0.10
4    'Adam'  0.10
5    'Adam'  0.10
6  'Andrew'  0.60

print pd.factorize(df.a)
(array([0, 1, 1, 2, 2, 2, 3]), 
 Index([u''John'', u''Mary'', u''Adam'', u''Andrew''], dtype='object'))

df['a'] = pd.factorize(df.a)[0]
print df

   a     b
0  0  0.25
1  1  0.20
2  1  0.20
3  2  0.10
4  2  0.10
5  2  0.10
6  3  0.60

Convert pandas series into integers

Answers (1)

Related Questions