Reputation: 177
I want create predict function which predicts for which cluster, observation belong
data(iris)
mydata=iris
m=mydata[1:4]
train=head(m,100)
xNew=head(m,10)
rownames(train)<-1:nrow(train)
norm_eucl=function(train)
train/apply(train,1,function(x)sum(x^2)^.5)
m_norm=norm_eucl(train)
result=kmeans(m_norm,3,30)
predict.kmean <- function(cluster, newdata)
{
simMat <- m_norm(rbind(cluster, newdata),
sel=(1:nrow(newdata)) + nrow(cluster))[1:nrow(cluster), ]
unname(apply(simMat, 2, which.max))
}
## assign new data samples to exemplars
predict.kmean(m_norm, x[result$cluster, ], xNew)
After i get the error
Error in predict.kmean(m_norm, x[result$cluster, ], xNew) :
unused argument (xNew)
i understand that i am making something wrong function, cause I'm just learning to do it, but I can't understand where exactly.
indeed i want adopt similar function of apcluster ( i had seen similar topic, but for apcluster)
predict.apcluster <- function(s, exemplars, newdata)
{
simMat <- s(rbind(exemplars, newdata),
sel=(1:nrow(newdata)) + nrow(exemplars))[1:nrow(exemplars), ]
unname(apply(simMat, 2, which.max))
}
## assign new data samples to exemplars
predict.apcluster(negDistMat(r=2), x[apres@exemplars, ], xNew)
how to do it?
Upvotes: 1
Views: 1360
Reputation: 48201
Rather than trying to replicate something, let's come up with our own function. For a given vector x
, we want to assign a cluster using some prior k-means output. Given how k-means algorithm works, what we want is to find which cluster's center is closest to x
. That can be done as
predict.kmeans <- function(x, newdata)
apply(newdata, 1, function(r) which.min(colSums((t(x$centers) - r)^2)))
That is, we go over newdata
row by row and compute the corresponding row's distance to each of the centers and find the minimal one. Then, e.g.,
head(predict(result, train / sqrt(rowSums(train^2))), 3)
# 1 2 3
# 2 2 2
all.equal(predict(result, train / sqrt(rowSums(train^2))), result$cluster)
# [1] TRUE
which confirms that our predicting function assigned all the same clusters to the training observations. Then also
predict(result, xNew / sqrt(rowSums(xNew^2)))
# 1 2 3 4 5 6 7 8 9 10
# 2 2 2 2 2 2 2 2 2 2
Notice also that I'm calling simply predict
rather than predict.kmeans
. That is because result
is of class kmeans
and a right method is automatically chosen. Also notice how I normalize the data in a vectorized manner, without using apply
.
Upvotes: 6