presenting new data to fitted self-organizing map and assign rows to clusters

Question

I am using this code, which fits a self-organizing map (SOM) and then clusters the resulting prototype vectors to define cluster boundaries:

library(dplyr)
library(kohonen)

setwd('C:\Users\Bla\Source\Repos\SomeExcitingRepo')

OrginalData <- read.table("IrisData.txt",
                   header = TRUE, sep = "	")

SubsetData <- subset(OrginalData, select = c("SepalLength", "SepalWidth", "PetalLength", "PetalWidth"))
TrainingMatrix <- as.matrix(scale(SubsetData))

GridDefinition <- somgrid(xdim = 4, ydim = 4, topo = "hexagonal")

SomModel <- kohonen::supersom(data = TrainingMatrix, grid = GridDefinition, rlen = 1000, alpha = c(0.05, 0.01),
             keep.data = TRUE)
groups = 3
iris.hc = cutree(hclust(dist(SomModel$codes[[1]])), groups)

plot(SomModel, type = "codes", bgcol = rainbow(groups)[iris.hc])
add.cluster.boundaries(SomModel, iris.hc)

The data is the iris dataset but that's just an example. The format of the dataset is as follows:

Uid SepalLength SepalWidth  PetalLength PetalWidth  Species
1   5.1 3.5 1.4 0.2 setosa

Let's now assume this is an unseen dataset. I would like to normalize it and present it to the SOM and then add to each row additional columns indicating the SOMs cluster number (1, 2, 3 see above example) and the winning node's x and y coordinates. Example:

Uid SepalLength SepalWidth PetalLength PetalWidth Species Cluster X Y
1 5.1 3.5 1.4 0.2 setosa 3 3 4

HubertL · Accepted Answer

You can use unit.classif to index the cluster or the grid points:

result <- OrginalData
result$Cluster <- iris.hc[SomModel$unit.classif]
result$X <- SomModel$grid$pts[SomModel$unit.classif,"x"]
result$Y <- SomModel$grid$pts[SomModel$unit.classif,"y"]

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species Cluster   X         Y
1          5.1         3.5          1.4         0.2  setosa       1 1.5 2.5980762
2          4.9         3.0          1.4         0.2  setosa       1 1.0 3.4641016
3          4.7         3.2          1.3         0.2  setosa       1 1.0 3.4641016
4          4.6         3.1          1.5         0.2  setosa       1 1.0 3.4641016
5          5.0         3.6          1.4         0.2  setosa       1 1.0 1.7320508
6          5.4         3.9          1.7         0.4  setosa       1 1.5 0.8660254

It doesn't look so good though:

points(jitter(result$X), jitter(result$Y), col=result$Species)
legend(5,0, legend=unique(result$Species), col=unique(result$Species), pch=1)

presenting new data to fitted self-organizing map and assign rows to clusters

Answers (1)

Related Questions