Reputation: 783
After training a SOM, how can you plot new data onto the SOM and visualise how it maps onto the SOM? Ideally, I would like for it to be plotted with the corresponding classification colour and node location. identify()
has the capability of pinpointing data based on selections on the SOM map but it is very limited and can only do one at a time. I would like to map a whole (new) dataset and visualise it. I am able to get the node location from using map()
and the group association, but how can I manually plot the new points onto the SOM? Couldn't find anything pertinent on the internet or the kohonen R documentation. Appreciate any help.
library(kohonen)
data(wines)
wines.train<-wines[1:150,]
wines.test<-wines[151:nrow(wines),]
wines.sc <- scale(wines.train)
set.seed(7)
wines.som<-som(wines.sc, grid = somgrid(5, 4, "hexagonal"),rlen=150,alpha=c(0.05,0.01))
wines.hc<-cutree(hclust(dist(wines.som$codes[[1]])),6)
plot(wines.som,type="mapping",bgcol=rainbow(6)[wines.hc])
add.cluster.boundaries(wines.som,wines.hc)
can be used to manually inspect specific nodes on SOM
identify(wines.som$grid$pts,labels=as.vector(wines.hc),plot=T,pos=T)
map new data onto trained SOM
wines.map<-map(wines.som,scale(wines.test))
wines.test.grp<-sapply(wines.map$unit.classif,function(x) wines.hc[[x]])
Upvotes: 3
Views: 1124
Reputation: 246
In my opinion, one thing to note is that you should not scale your test data using value inside of it. You should scale your test data using scaling parameter of your train data. Because the model was trained using information from the train data. It has not seen the test data.
So your scaled test data would be like this:
wines.test.scale <- scale(wines.test, center = attr(wines.sc, 'scaled:center'), scale = attr(wines.sc, 'scaled:scale'))
Now you can assign a new member to your model. This is a distance measurement of each data to every model's node. Because you split your data to train and test, there can be two new members added to your model, i.e. the train distance and the test distance. I give them names train.map and test.map, since this process can be regarded as a mapping process of input data to the model's map.
wines.som$train.map <- apply(
wines.sc, 1, function(input1) {
apply(
wines.som$codes[[1]], 1, function(input2) dist(rbind(input1, input2))
)
}
)
wines.som$test.map <- apply(
wines.test.scale, 1, function(input1) {
apply(
wines.som$codes[[1]], 1, function(input2) dist(rbind(input1, input2))
)
}
)
I think one must put the variable into the model because once the kernel attached the library, it overrode the base plot function with that of the package until one detach the package. The new plot function must recognize that the variable being processed has a proper class.
Now you can plot the map of your individual input data to the model's network. You can put two stages here: the train data mapping and the test data mapping.
par(mfrow = c(5,5))
for (a in 1:ncol(wines.som$train.map)) {
plot(
wines.som, type = 'property', property = wines.som$train.map[,a],
main = paste('train',a)
)
}
par(mfrow = c(5,5))
for (a in 1:ncol(wines.som$test.map)) {
plot(
wines.som, type = 'property', property = wines.som$test.map[,a],
main = paste('test',a)
)
}
Upvotes: 0