Reputation: 1
I am trying to use KNN to create a choice prediction model. The data is formatted as follows, where I am trying to predict whether a person chooses X or Y.
Choice Prediction Data Structure
When I run the code, I get the following error:
"In train.default(training[, 1:7], training[, 8], method = "knn") : You are trying to do regression and your outcome only has two possible values Are you trying to do classification? If so, use a 2 level factor as your outcome column."
Here is the rest of the relevant code:
index <- createDataPartition(dataset_training$choiceprobX, p=0.5, list=FALSE)
print(dataset_training$choiceprobX)
index
training <- dataset_training[index,]
testing <- dataset_training[-index,]
training
testing
model_knn <- train(training[, 1:7], training[, 8], method='knn')
What am I doing wrong? Do I need to change to a classification? If so, how exactly do I do that?
Upvotes: 0
Views: 676
Reputation: 46908
You have the target variable in 0,1 numeric. You needa convert it into a factor:
library(caret)
dataset_training = MASS::Pima.te
dataset_training$type = as.numeric(dataset_training$type)-1
head(dataset_training)
npreg glu bp skin bmi ped age type
1 6 148 72 35 33.6 0.627 50 1
2 1 85 66 29 26.6 0.351 31 0
3 1 89 66 23 28.1 0.167 21 0
4 3 78 50 32 31.0 0.248 26 1
5 2 197 70 45 30.5 0.158 53 1
6 5 166 72 19 25.8 0.587 51 1
index <- createDataPartition(dataset_training$type, p=0.5, list=FALSE)
training <- dataset_training[index,]
model_knn <- train(training[, 1:7], training[, 8], method='knn')
Warning message:
In train.default(training[, 1:7], training[, 8], method = "knn") :
You are trying to do regression and your outcome only has two possible values Are you trying to do classification?[..]
Gives us the same error. Now convert it to a factor:
dataset_training$type = factor(dataset_training$type)
index <- createDataPartition(dataset_training$type, p=0.5, list=FALSE)
training <- dataset_training[index,]
model_knn <- train(training[, 1:7], training[, 8], method='knn')
Upvotes: 1
Reputation: 1640
Though I don't use r
much, but I can figure out some mistakes in the code.
It is better and less complicated if your target variable is in one column coded as 1
and 0
Add a new column which will be your target variable like this:
dataset_training$target <- ifelse(dataset_training$choiceprobX == 1,1,0)
Now use this column as target variable, where 1
means the class value is X
and 0
means Y
Now since you are fitting a classification
model, you need to convert this variable to factor
dataset_training$target <- as.factor(dataset_training$target)
Now try to fit the model, you might still get some errors, which you can post in comments!!
Upvotes: 0