How can I encode a categorical column with the codes I want?

I've got a dataframe like this:

df = pd.DataFrame({'months': ['FEBRUARY', 'MARCH', 'MAY', 'DECEMBER', 'MAY']})

And I want to get:

[['JANUARY', 1], ['FEBRUARY', 2], ['MARCH', 3]]

I think it should be very easy but, when y try with this dummy example from sklearn:

from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder(handle_unknown='ignore')
X = [[1,'Male'], [ 3,'Female']]
enc.fit(X)

I get the next error:

 ValueError: could not convert string to float: 'Male'

Thx in advance.

Upvotes: 1

Views: 93

Answers (1)

nimrodz
nimrodz

Reputation: 1594

you can use map

gender = {'male':1,'female':3}
df.gender.map(gender)

Upvotes: 1

Related Questions