Reputation: 10139
Using Python 2.7. The question is about fit method. Question is for features (provided by parameter X
), if there are non-numeric features (e.g. string type features, like Male
, Female
), do I need, or it is recommended to convert into numeric features (for performance and other reasons)? And also if I have multi-value string type features (e.g. feature geo could be any value of San Francisco
, San Jose
, Mountain View
, etc.)
regards, Lin
Upvotes: 0
Views: 1063
Reputation: 21
Just to add a bit to MhFarahani's answer: Yes, you need to convert those labels to numerical values (generally 0 or 1). For things like gender, you would want to have a row that has 0 for male and 1 for female, or vice versa. For something like geographical location, it'd be a bit more complicated. If there's a reasonable number of possible answers, you could use the get_dummies function in pandas (check the doc here) to automatically populate your dataframe with rows to represent each possible location; you could then drop one of those rows to make that location the 'default'.
Upvotes: 2
Reputation: 970
You must encode categorical features and convert them to numerical values, if you want to use sklearn
. This apples to all sklearn
estimators (including LogisticRegression
) and it does not matter which version of python you are using.
look at 4.3.4. Encoding categorical features of http://scikit-learn.org/stable/modules/preprocessing.html#encoding-categorical-features for more information.
Upvotes: 1