Reputation: 879
Given a dataframe like this:
'John', 0.25
'Mary', 0.2
'Adam', 0.1
'Andrew', 0.6
I would like to generate a unique integer for every category in a certain series. For example, in the case above, the output could be something like this
0, 0.25
1, 0.2
2, 0.1
3, 0.6
possibly with pandas or standard libraries only.
Upvotes: 2
Views: 759
Reputation: 862761
I think you can use factorize
like:
print df
a b
0 'John' 0.25
1 'Mary' 0.20
2 'Mary' 0.20
3 'Adam' 0.10
4 'Adam' 0.10
5 'Adam' 0.10
6 'Andrew' 0.60
print pd.factorize(df.a)
(array([0, 1, 1, 2, 2, 2, 3]),
Index([u''John'', u''Mary'', u''Adam'', u''Andrew''], dtype='object'))
df['a'] = pd.factorize(df.a)[0]
print df
a b
0 0 0.25
1 1 0.20
2 1 0.20
3 2 0.10
4 2 0.10
5 2 0.10
6 3 0.60
Upvotes: 1