Henk Straten
Henk Straten

Reputation: 1447

Add the center points of a cluster in ggplot2

For training purposes I want to create a Shiny application that outlines the steps in a KNN algorithm. The first step I want to is show is the center of two clusters.

I use ggplot to first show the Sepal.Length and Sepal.Width of the iris dataset.

library(ggplot2)

g <- ggplot(data=iris, aes(x=iris$Sepal.Length, y = iris$Sepal.Width))
g + geom_point()          

Then I randomly assign a cluster to the set:

iris$Cluster <- 0
for(i in 1:nrow(iris)){
  randInt <- x1 <- round(runif(1, 0, 1),0)
  ifelse(randInt == 0,iris$Cluster[i] <- 1, iris$Cluster[i] <- 0)
}
iris$Cluster <- as.factor(iris$Cluster)                               
g <- ggplot(data=iris, aes(x=iris$Sepal.Length, y = iris$Sepal.Width, colour = Cluster))
g + geom_point()    

Now the next step I want to take is to show a dot in my plot that is the center of cluster 0 and cluster 1.

Any thoughts on how I can do this in ggplot2

Upvotes: 4

Views: 3364

Answers (2)

jay.sf
jay.sf

Reputation: 72899

In base R (apart from ggplot2) we can do:

library(ggplot2)

iris$Cluster <- as.factor(rbinom(nrow(iris), 1, .5))  # more convenient

ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, colour=Cluster)) +
  geom_point() +
  geom_point(aggregate(iris, by=list(Cluster=iris$Cluster), mean)[, 1:3], 
             size=10, shape=3) +
  theme_bw() + labs(x="Sepal Length", y="Sepal Width", color="Cluster Type")

Yields:

enter image description here

Upvotes: 0

eipi10
eipi10

Reputation: 93821

You can calculate the centroid of each cluster on the fly within a second call to geom_point. Here's an example using tidyverse functions. We calculate the mean of Sepal.Length and Sepal.Width within each cluster and plot these mean values using crosses as the point markers. Note also that you shouldn't restate the data frame name within aes, but should instead use column names alone.

library(tidyverse)

# Assign random cluster value
iris$cluster = sample(0:1, nrow(iris), replace=TRUE)

ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, colour=factor(cluster))) +
  geom_point() +
  geom_point(data=iris %>% 
               group_by(cluster) %>% 
               summarise_at(vars(matches("Sepal")), mean),
             size=5, shape=3) +
  theme_classic()

enter image description here

Upvotes: 5

Related Questions