stats_noob
stats_noob

Reputation: 5907

Coloring clusters

I am using the following code which performs the SOM (Self Organizing Map, also called the Kohonen Network) machine learning algorithm to visualize some data. Then, I use a clustering algorithm (I select 8 clusters) on the visualization:

#load library
library(tidyverse)
library(kohonen)
library(GGally)
library(purrr)
library(tidyr)
library(dplyr)
library(mlr)

#load data
data(flea)
fleaTib <- as_tibble(flea)

#define SOM grid
somGrid <- somgrid(xdim = 5, ydim = 5, topo = "hexagonal",
neighbourhood.fct = "bubble", toroidal = FALSE)

#format data
fleaScaled <- fleaTib %>%
select(-species) %>%
scale()

#perform som
fleaSom <- som(fleaScaled, grid = somGrid, rlen = 5000,
alpha = c(0.05, 0.01))

par(mfrow = c(2, 3))
plotTypes <- c("codes", "changes", "counts", "quality",
"dist.neighbours", "mapping")
walk(plotTypes, ~plot(fleaSom, type = ., shape = "straight"))

getCodes(fleaSom) %>%
as_tibble() %>%
iwalk(~plot(fleaSom, type = "property", property = .,
main = .y, shape = "straight"))

# listing flea species on SOM

par(mfrow = c(1, 2))
nodeCols <- c("cyan3", "yellow", "purple", "red", "blue", "green", "white", "pink")
plot(fleaSom, type = "mapping", pch = 21,
bg = nodeCols[as.numeric(fleaTib$species)],
shape = "straight", bgcol = "lightgrey")

# CLUSTER AND ADD TO SOM MAP ---- (8 clusters)
clusters <- cutree(hclust(dist(fleaSom$codes[[1]], 
                               method = "manhattan")), 8)

somClusters <- map_dbl(clusters, ~{
    if(. == 1) 3
    else if(. == 2) 2
    else 1
}
)


plot(fleaSom, type = "mapping", pch = 21, 
     bg = nodeCols[as.numeric(fleaTib$species)],
     shape = "straight",
     bgcol = nodeCols[as.integer(somClusters)])

add.cluster.boundaries(fleaSom, somClusters)

enter image description here

But in the above plot, only 3 colors are shown instead of 8.

Can someone please show me what I am doing wrong?

Upvotes: 0

Views: 250

Answers (1)

DaveArmstrong
DaveArmstrong

Reputation: 21937

Replace somClusters with clusters in the definition of the background color in the last plot. The main issue is that you defined somClusters to have three values, not 8. If you use that to index the vector of colors, it will only have three colors.

plot(fleaSom, type = "mapping", pch = 21, 
     bg = nodeCols[as.numeric(fleaTib$species)],
     shape = "straight",
     bgcol = nodeCols[as.integer(clusters)])

add.cluster.boundaries(fleaSom, somClusters)

enter image description here

Upvotes: 2

Related Questions