Reputation: 1447
For training purposes I want to create a Shiny application that outlines the steps in a KNN algorithm. The first step I want to is show is the center of two clusters.
I use ggplot to first show the Sepal.Length and Sepal.Width of the iris dataset.
library(ggplot2)
g <- ggplot(data=iris, aes(x=iris$Sepal.Length, y = iris$Sepal.Width))
g + geom_point()
Then I randomly assign a cluster to the set:
iris$Cluster <- 0
for(i in 1:nrow(iris)){
randInt <- x1 <- round(runif(1, 0, 1),0)
ifelse(randInt == 0,iris$Cluster[i] <- 1, iris$Cluster[i] <- 0)
}
iris$Cluster <- as.factor(iris$Cluster)
g <- ggplot(data=iris, aes(x=iris$Sepal.Length, y = iris$Sepal.Width, colour = Cluster))
g + geom_point()
Now the next step I want to take is to show a dot in my plot that is the center of cluster 0 and cluster 1.
Any thoughts on how I can do this in ggplot2
Upvotes: 4
Views: 3364
Reputation: 72899
In base R (apart from ggplot2
) we can do:
library(ggplot2)
iris$Cluster <- as.factor(rbinom(nrow(iris), 1, .5)) # more convenient
ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, colour=Cluster)) +
geom_point() +
geom_point(aggregate(iris, by=list(Cluster=iris$Cluster), mean)[, 1:3],
size=10, shape=3) +
theme_bw() + labs(x="Sepal Length", y="Sepal Width", color="Cluster Type")
Upvotes: 0
Reputation: 93821
You can calculate the centroid of each cluster on the fly within a second call to geom_point
. Here's an example using tidyverse
functions. We calculate the mean of Sepal.Length
and Sepal.Width
within each cluster and plot these mean values using crosses as the point markers. Note also that you shouldn't restate the data frame name within aes
, but should instead use column names alone.
library(tidyverse)
# Assign random cluster value
iris$cluster = sample(0:1, nrow(iris), replace=TRUE)
ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, colour=factor(cluster))) +
geom_point() +
geom_point(data=iris %>%
group_by(cluster) %>%
summarise_at(vars(matches("Sepal")), mean),
size=5, shape=3) +
theme_classic()
Upvotes: 5