Reputation: 17428
I am using this code, which fits a self-organizing map (SOM) and then clusters the resulting prototype vectors to define cluster boundaries:
library(dplyr)
library(kohonen)
setwd('C:\\Users\\Bla\\Source\\Repos\\SomeExcitingRepo')
OrginalData <- read.table("IrisData.txt",
header = TRUE, sep = "\t")
SubsetData <- subset(OrginalData, select = c("SepalLength", "SepalWidth", "PetalLength", "PetalWidth"))
TrainingMatrix <- as.matrix(scale(SubsetData))
GridDefinition <- somgrid(xdim = 4, ydim = 4, topo = "hexagonal")
SomModel <- kohonen::supersom(data = TrainingMatrix, grid = GridDefinition, rlen = 1000, alpha = c(0.05, 0.01),
keep.data = TRUE)
groups = 3
iris.hc = cutree(hclust(dist(SomModel$codes[[1]])), groups)
plot(SomModel, type = "codes", bgcol = rainbow(groups)[iris.hc])
add.cluster.boundaries(SomModel, iris.hc)
The data is the iris dataset but that's just an example. The format of the dataset is as follows:
Uid SepalLength SepalWidth PetalLength PetalWidth Species
1 5.1 3.5 1.4 0.2 setosa
Let's now assume this is an unseen dataset. I would like to normalize it and present it to the SOM and then add to each row additional columns indicating the SOMs cluster number (1, 2, 3 see above example) and the winning node's x and y coordinates. Example:
Uid SepalLength SepalWidth PetalLength PetalWidth Species Cluster X Y
1 5.1 3.5 1.4 0.2 setosa 3 3 4
Upvotes: 3
Views: 336
Reputation: 19544
You can use unit.classif
to index the cluster or the grid points:
result <- OrginalData
result$Cluster <- iris.hc[SomModel$unit.classif]
result$X <- SomModel$grid$pts[SomModel$unit.classif,"x"]
result$Y <- SomModel$grid$pts[SomModel$unit.classif,"y"]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species Cluster X Y
1 5.1 3.5 1.4 0.2 setosa 1 1.5 2.5980762
2 4.9 3.0 1.4 0.2 setosa 1 1.0 3.4641016
3 4.7 3.2 1.3 0.2 setosa 1 1.0 3.4641016
4 4.6 3.1 1.5 0.2 setosa 1 1.0 3.4641016
5 5.0 3.6 1.4 0.2 setosa 1 1.0 1.7320508
6 5.4 3.9 1.7 0.4 setosa 1 1.5 0.8660254
It doesn't look so good though:
points(jitter(result$X), jitter(result$Y), col=result$Species)
legend(5,0, legend=unique(result$Species), col=unique(result$Species), pch=1)
Upvotes: 1