dgene54
dgene54

Reputation: 81

'Error in knn(train = trainset, test = testset, cl, k = 1, l = 0, prob = FALSE, : 'train' and 'class' have different lengths

Getting the aforementioned error using the following code:

install.packages("class")
library("class")

mydata <- read.table("http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv", sep=";", header=TRUE);

index <- 1:nrow(mydata)
testindex <- sample(index, trunc(length(index)/6))
trainset <-mydata[testindex,]
testset <- mydata[-testindex,]


cl <- factor(c(rep("quality",3),rep("residual.sugar",3)))
knn(train = trainset, test = testset, cl, k = 1, l = 0, prob = FALSE, use.all = TRUE)

Please advise. feel free to change the way I set up 'cl'. honestly have no idea what I'm doing with that. I seek to classify 'quality' based on 'residual.sugar'

Upvotes: 2

Views: 2020

Answers (1)

LyzandeR
LyzandeR

Reputation: 37879

If you need to classify quality based on residual.sugar then quality is your cl argument. This is written in the documentation as well:

cl: factor of true classifications of training set

So, in order to run your knn model you need to do:

library("class")

mydata <- read.table("http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv", sep=";", header=TRUE);

index <- 1:nrow(mydata)
testindex <- sample(index, trunc(length(index)/6))
trainset <-mydata[testindex,]
testset <- mydata[-testindex,]

knn(train = trainset['residual.sugar'],   #you only need residual.sugar you said so just use that
    test=testset['residual.sugar'],       #again test is the residual.sugar
    cl=as.factor(trainset[['quality']]) , #your cl argument is quality
    k=1, l=0, prob=F, use.all=T)

And do not define cl previously at all.

Upvotes: 1

Related Questions