logistic regression feature value normalization in scikit-learn

Question

Using Python 2.7. The question is about fit method. Question is for features (provided by parameter X), if there are non-numeric features (e.g. string type features, like Male, Female), do I need, or it is recommended to convert into numeric features (for performance and other reasons)? And also if I have multi-value string type features (e.g. feature geo could be any value of San Francisco, San Jose, Mountain View, etc.)

http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression.fit

regards, Lin

MhFarahani · Accepted Answer

You must encode categorical features and convert them to numerical values, if you want to use sklearn. This apples to all sklearn estimators (including LogisticRegression) and it does not matter which version of python you are using.

look at 4.3.4. Encoding categorical features of http://scikit-learn.org/stable/modules/preprocessing.html#encoding-categorical-features for more information.

logistic regression feature value normalization in scikit-learn

Answers (2)

Related Questions