Reputation: 59385
I'm trying to (partially) reproduce the cluster plot available throught s.class(...)
in package ade4
using ggplot
, but this question is actually much more general.
NB: This question refers to "star plots", but really only discusses spider plots.
df <- mtcars[,c(1,3,4,5,6,7)]
pca <-prcomp(df, scale.=T, retx=T)
scores <-data.frame(pca$x)
library(ade4)
km <- kmeans(df,centers=3)
plot.df <- cbind(scores$PC1, scores$PC2)
s.class(plot.df, factor(km$cluster))
The essential feature I'm looking for is the "stars", e.g. a set of lines radiating from a common point (here, the cluster centroids) to a number of other points (here, the points in the cluster).
Is there a way to do that using the ggplot
package? If not directly through ggplot
, then does anyone know of an add-in that works. For example, there are several variations on stat_ellipse(...)
which is not part of the ggplot
package (here, and here).
Upvotes: 8
Views: 3866
Reputation: 59385
This answer is based on @agstudy's response and the suggestions made in @Henrik's comment. Posting because it's shorter and more directly applicable to the question.
Bottom line is this: star plots are readily made with ggplot
using geom_segment(...)
. Using df, pca, scores, and km from the question:
# build ggplot dataframe with points (x,y) and corresponding groups (cluster)
gg <- data.frame(cluster=factor(km$cluster), x=scores$PC1, y=scores$PC2)
# calculate group centroid locations
centroids <- aggregate(cbind(x,y)~cluster,data=gg,mean)
# merge centroid locations into ggplot dataframe
gg <- merge(gg,centroids,by="cluster",suffixes=c("",".centroid"))
# generate star plot...
ggplot(gg) +
geom_point(aes(x=x,y=y,color=cluster), size=3) +
geom_point(data=centroids, aes(x=x, y=y, color=cluster), size=4) +
geom_segment(aes(x=x.centroid, y=y.centroid, xend=x, yend=y, color=cluster))
Result is identical to that obtained with s.class(...)
.
Upvotes: 7
Reputation: 121588
The difficulty here is to create data not the plot itself. You should go through the code of the package and extract what it is useful for you. This should be a good start :
dfxy <- plot.df
df <- data.frame(dfxy)
x <- df[, 1]
y <- df[, 2]
fac <- factor(km$cluster)
f1 <- function(cl) {
n <- length(cl)
cl <- as.factor(cl)
x <- matrix(0, n, length(levels(cl)))
x[(1:n) + n * (unclass(cl) - 1)] <- 1
dimnames(x) <- list(names(cl), levels(cl))
data.frame(x)
}
wt = rep(1, length(fac))
dfdistri <- f1(fac) * wt
w1 <- unlist(lapply(dfdistri, sum))
dfdistri <- t(t(dfdistri)/w1)
## create a data.frame
cstar=2
ll <- lapply(seq_len(ncol(dfdistri)),function(i){
z1 <- dfdistri[,i]
z <- z1[z1>0]
x <- x[z1>0]
y <- y[z1>0]
z <- z/sum(z)
x1 <- sum(x * z)
y1 <- sum(y * z)
hx <- cstar * (x - x1)
hy <- cstar * (y - y1)
dat <- data.frame(x=x1, y=y1, xend=x1 + hx, yend=y1 + hy,center=factor(i))
})
dat <- do.call(rbind,ll)
library(ggplot2)
ggplot(dat,aes(x=x,y=y))+
geom_point(aes(shape=center)) +
geom_segment(aes(yend=yend,xend=xend,color=center,group=center))
Upvotes: 4