bshah
bshah

Reputation: 167

Scikit Learn : Logistic Regression Error

I am trying to run logistic regression on my data (6 categorical, 1 integer) using scikit learn. I am following the scikit learn documentation but when trying to fit my data I am getting the following value error. Can someone please help.

#Below are the variables of my data.
train_data.dtypes
    OUTPUT
    TripType                 category
    VisitNumber              category
    Weekday                  category
    Upc                      category
    ScanCount                   int64
    DepartmentDescription    category
    FinelineNumber           category
    dtype: object


X = train_data.loc[:, 'VisitNumber':'FinelineNumber']
Y = train_data.loc[:, 'TripType':'TripType']
logreg = linear_model.LogisticRegression()
logreg.fit(X, Y)

**ValueError: could not convert string to float: GROCERY DRY GOODS**

Upvotes: 1

Views: 6928

Answers (2)

PJay
PJay

Reputation: 2797

You cannot use names of categories directly as features in logistic regression. You need to convert them into some encoded vectors (or dummy variables). If you have 6 categories you need to use 5 dummy variables.

You can check the Encoding Categorical Features section in the following sklearn package link : http://scikit-learn.org/stable/modules/preprocessing.html

Upvotes: 0

jakevdp
jakevdp

Reputation: 86330

Scikit-learn can only handle numerical features. For some ideas on how to proceed in your case, see Encoding Categorical Features in the scikit-learn docs.

Upvotes: 2

Related Questions