Reputation: 33
I have a 60.000 obs/40 Variable dataset on which I used Clara, mainly due to memory constrains.
library(cluster)
library(dplyr)
mutate(kddnew, Att=ifelse(Class=="normal","normal", "attack"))
ds <- dat[,c(-20,-21,-40)
clus <- clara(ds, 3, samples=500, sampsize=100, pamLike=TRUE)
This returned a table with medoids.
Now I'm trying to use knn
to do a prediction like this:
medoidz <- clus$medoids
r <- knn(medoidz, ds, cl=ds$targetvariable)
And it returns
'train' and 'class' have different lengths
Can someone please shed some light on how to use it?
Upvotes: 3
Views: 1210
Reputation: 23200
This works:
require(cluster)
require(class)
data(iris)
ds <- iris
ds$y <- as.numeric(ds$Species)
ds$Species <- NULL
idx <- rbinom(nrow(ds), 2, .6)
training <- ds[idx,]
testing <- ds[-idx,]
x <- training
y <- training$y
x1 <- testing
y1 <- testing$y
clus <- clara(x, 3, samples = 1, sampsize = nrow(x), pamLike=TRUE)
knn(train = x, test = x1, cl = clus$clustering, k = 10, l = 0, prob = T, use.all = T)
Though 3 is clearly a poor choice for the number of clusters in this dataset, so the prediction isn't good. Hopefully you'll choose the right number of clusters for your data and you can test your prediction strength with prediction.strength
from the package fpc
or in other ways.
Upvotes: 4