firmo23
firmo23

Reputation: 8404

Manually set the legend names

I have the dataframe below which I process properly in order to create a cluster scatter plot with:

library(tidyverse)  # data manipulation
library(cluster)    # clustering algorithms
library(factoextra) # clustering algorithms & visualization
library(plotly)
df <- USArrests
df <- na.omit(df)

df <- scale(df)
distance <- get_dist(df)

k2 <- kmeans(df, centers = 2, nstart = 25)
df %>%
  as_tibble() %>%
  mutate(cluster = k2$cluster,
         state = row.names(USArrests))
p2<-fviz_cluster(k2, data = df, geom="point")
#+ scale_fill_discrete(name = "Cluster", labels = c("1", "2", "3","4"))
p2
ggplotly(p2)

When I use ggplotly() then the legend names change so Im looking for a way to set them manually or even hide the legend at all.

Upvotes: 1

Views: 1842

Answers (1)

Johannes St&#246;tzer
Johannes St&#246;tzer

Reputation: 506

the easiest way I came a cross is renaming the label within the object.

p2<-fviz_cluster(k2, data = df, geom="point")

p3 <- ggplotly(p2)

p3[["x"]][["data"]][[2]][["name"]] <- "2"
p3

It's not pretty but helps in the short term.

Edit: so there was more than one question First: is about legend labels Second: about interactive points in the plot # Most of the example code was given, # only change of center variable

# Example
library(tidyverse)  # data manipulation
library(cluster)    # clustering algorithms
library(factoextra) # clustering algorithms & visualization
library(plotly)
df <- USArrests
df <- na.omit(df)

df <- scale(df)
distance <- get_dist(df)

# added center variable for number of centers in kmeans
# this will also be used to select elemnets from ggplot or ggplotly later

centers=4
k2 <- kmeans(df, centers = centers, nstart = 25)
df %>%
  as_tibble() %>%
  mutate(cluster = k2$cluster,
         state = row.names(USArrests))

p2<-fviz_cluster(k2, data = df, geom="point")

p2
p3 <- ggplotly(p2)

# Solution
# First Problem: Changing legend labels 
# Because the transition from ggplot to ggplotly
#   messes up multiple scales like here (color and shape)
# Why it looks like intended when only changing the point layer, 
#   I don't know

for (i in 1:centers) {
  p3[["x"]][["data"]][[i]][["name"]] <- i
}

# Second Problem: interactive points
# ggplot saves the data in one list and ggplotly splits the data 
#    depending on layer and cluster
# for the labels it is enough to change the point layers 
#    (the first x depending on num. of centers)
# to add more inforamtion to labels 
#   manipulate the variable names_states with html
for (i in 1:centers) {
  name_states <- p2[["data"]]%>%
    filter(cluster==i)%>%
    select(name)

  p3[["x"]][["data"]][[i]][["text"]] <- as.vector(name_states$name)
}

# Changing order of layers because polygon-layer is on top and 
#    makes it impossible to hover over points beneeth
p3[["x"]][["data"]] <- p3[["x"]][["data"]][(centers*3):1]

# Now you can hover over every point and can see the state name
p3

result

Upvotes: 2

Related Questions