Reputation: 167
I am trying to run logistic regression on my data (6 categorical, 1 integer) using scikit learn. I am following the scikit learn documentation but when trying to fit my data I am getting the following value error. Can someone please help.
#Below are the variables of my data.
train_data.dtypes
OUTPUT
TripType category
VisitNumber category
Weekday category
Upc category
ScanCount int64
DepartmentDescription category
FinelineNumber category
dtype: object
X = train_data.loc[:, 'VisitNumber':'FinelineNumber']
Y = train_data.loc[:, 'TripType':'TripType']
logreg = linear_model.LogisticRegression()
logreg.fit(X, Y)
**ValueError: could not convert string to float: GROCERY DRY GOODS**
Upvotes: 1
Views: 6928
Reputation: 2797
You cannot use names of categories directly as features in logistic regression. You need to convert them into some encoded vectors (or dummy variables). If you have 6 categories you need to use 5 dummy variables.
You can check the Encoding Categorical Features section in the following sklearn package link : http://scikit-learn.org/stable/modules/preprocessing.html
Upvotes: 0
Reputation: 86330
Scikit-learn can only handle numerical features. For some ideas on how to proceed in your case, see Encoding Categorical Features in the scikit-learn docs.
Upvotes: 2