Markm0705
Markm0705

Reputation: 1440

R kohonen - Is the input data scaled and centred automatically?

I have been following an online example for R Kohonen self-organising maps (SOM) which suggested that the data should be centred and scaled before computing the SOM.

However, I've noticed the object created seems to have attributes for centre and scale, in which case am I really applying a redundant step by centring and scaling first? Example script below

# Load package
require(kohonen)

# Set data
data(iris)

# Scale and centre
dt <- scale(iris[, 1:4],center=TRUE)

# Prepare SOM
set.seed(590507)
som1 <- som(dt,
         somgrid(6,6, "hexagonal"),
         rlen=500,
        keep.data=TRUE)

str(som1)

The output from the last line of the script is:

List of 13
 $ data            :List of 1
  ..$ : num [1:150, 1:4] -0.898 -1.139 -1.381 -1.501 -1.018 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : NULL
  .. .. ..$ : chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" 
"Petal.Width"
  .. ..- attr(*, "scaled:center")= Named num [1:4] 5.84 3.06 3.76 1.2
  .. .. ..- attr(*, "names")= chr [1:4] "Sepal.Length" "Sepal.Width" 
"Petal.Length" "Petal.Width"
  .. ..- attr(*, "scaled:scale")= Named num [1:4] 0.828 0.436 1.765 0.762
  .. .. ..- attr(*, "names")= chr [1:4] "Sepal.Length" "Sepal.Width" 
"Petal.Length" "Petal.Width"
 $ unit.classif    : num [1:150] 3 5 5 5 4 2 4 4 6 5 ...
 $ distances       : num [1:150] 0.0426 0.0663 0.0768 0.0744 0.1346 ...
 $ grid            :List of 6
  ..$ pts              : num [1:36, 1:2] 1.5 2.5 3.5 4.5 5.5 6.5 1 2 3 4 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : NULL
  .. .. ..$ : chr [1:2] "x" "y"
  ..$ xdim             : num 6
  ..$ ydim             : num 6
  ..$ topo             : chr "hexagonal"
  ..$ neighbourhood.fct: Factor w/ 2 levels "bubble","gaussian": 1
  ..$ toroidal         : logi FALSE
  ..- attr(*, "class")= chr "somgrid"
 $ codes           :List of 1
  ..$ : num [1:36, 1:4] -0.376 -0.683 -0.734 -1.158 -1.231 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr [1:36] "V1" "V2" "V3" "V4" ...
  .. .. ..$ : chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" 
"Petal.Width"
 $ changes         : num [1:500, 1] 0.0445 0.0413 0.0347 0.0373 0.0337 ...
 $ alpha           : num [1:2] 0.05 0.01
 $ radius          : Named num [1:2] 3.61 0
  ..- attr(*, "names")= chr [1:2] "66.66667%" ""
 $ user.weights    : num 1
 $ distance.weights: num 1
 $ whatmap         : int 1
 $ maxNA.fraction  : int 0
 $ dist.fcts       : chr "sumofsquares"
 - attr(*, "class")= chr "kohonen"

Note notice that in lines 7 and 10 of the output there are references to centre and scale. I would appreciate an explanation as to the process here.

Upvotes: 1

Views: 158

Answers (1)

Alex
Alex

Reputation: 360

Your step with scaling is not redundant because in source code there are no scaling, and attributes, that you see in 7 and 10 are attributes from train dataset. To check this, just run and compare results of this chunk of code:

# Load package
require(kohonen)

# Set data
data(iris)

# Scale and centre
dt <- scale(iris[, 1:4],center=TRUE)
#compare train datasets
str(dt)
str(as.matrix(iris[, 1:4]))

# Prepare SOM
set.seed(590507)
som1 <- kohonen::som(dt,
                     kohonen::somgrid(6,6, "hexagonal"),
            rlen=500,
            keep.data=TRUE)
#without scaling
som2 <- kohonen::som(as.matrix(iris[, 1:4]),
                     kohonen::somgrid(6,6, "hexagonal"),
                     rlen=500,
                     keep.data=TRUE)
#compare results of som function
str(som1)
str(som2)

Upvotes: 1

Related Questions