theodorecruz
theodorecruz

Reputation: 1

Regression/Classification Error in KNN algorithm for choice prediction

I am trying to use KNN to create a choice prediction model. The data is formatted as follows, where I am trying to predict whether a person chooses X or Y.

Choice Prediction Data Structure

When I run the code, I get the following error:

"In train.default(training[, 1:7], training[, 8], method = "knn") : You are trying to do regression and your outcome only has two possible values Are you trying to do classification? If so, use a 2 level factor as your outcome column."

Here is the rest of the relevant code:

index <- createDataPartition(dataset_training$choiceprobX, p=0.5, list=FALSE)
print(dataset_training$choiceprobX)
index
training <- dataset_training[index,]
testing <- dataset_training[-index,]
training
testing
model_knn <- train(training[, 1:7], training[, 8], method='knn')

What am I doing wrong? Do I need to change to a classification? If so, how exactly do I do that?

Upvotes: 0

Views: 676

Answers (2)

StupidWolf
StupidWolf

Reputation: 46908

You have the target variable in 0,1 numeric. You needa convert it into a factor:

library(caret)

dataset_training = MASS::Pima.te
dataset_training$type = as.numeric(dataset_training$type)-1
head(dataset_training)

  npreg glu bp skin  bmi   ped age type
1     6 148 72   35 33.6 0.627  50    1
2     1  85 66   29 26.6 0.351  31    0
3     1  89 66   23 28.1 0.167  21    0
4     3  78 50   32 31.0 0.248  26    1
5     2 197 70   45 30.5 0.158  53    1
6     5 166 72   19 25.8 0.587  51    1

index <- createDataPartition(dataset_training$type, p=0.5, list=FALSE)
training <- dataset_training[index,]
model_knn <- train(training[, 1:7], training[, 8], method='knn')

Warning message:
In train.default(training[, 1:7], training[, 8], method = "knn") :
  You are trying to do regression and your outcome only has two possible values Are you trying to do classification?[..]

Gives us the same error. Now convert it to a factor:

dataset_training$type = factor(dataset_training$type)
index <- createDataPartition(dataset_training$type, p=0.5, list=FALSE)
training <- dataset_training[index,]
model_knn <- train(training[, 1:7], training[, 8], method='knn')

Upvotes: 1

ManojK
ManojK

Reputation: 1640

Though I don't use r much, but I can figure out some mistakes in the code.

It is better and less complicated if your target variable is in one column coded as 1 and 0

Add a new column which will be your target variable like this:

dataset_training$target <- ifelse(dataset_training$choiceprobX == 1,1,0)

Now use this column as target variable, where 1 means the class value is X and 0 means Y

Now since you are fitting a classification model, you need to convert this variable to factor

dataset_training$target <- as.factor(dataset_training$target)

Now try to fit the model, you might still get some errors, which you can post in comments!!

Upvotes: 0

Related Questions