stats_noob
stats_noob

Reputation: 5925

Error: Mean Distance Between Objects Zero

I am trying to learn about the "kohonen" package in R. In particular, there is a function called "supersom()" (https://www.rdocumentation.org/packages/kohonen/versions/3.0.10/topics/supersom , corresponding to the SOM (Self Organizing Maps) algorithm used in unsupervised machine learning) that I am trying to apply on some data.

Below, (from a previous question: R error: "Error in check.data : Argument Should be Numeric") I learned how to apply the "supersom()" function on some artificially created data with both "factor" and "numeric" variables.

#the following code works

#load libraries 
    library(kohonen)
    library(dplyr)
    
#create and format data

a =rnorm(1000,10,10)
b = rnorm(1000,10,5)
c = rnorm(1000,5,5)
d = rnorm(1000,5,10)
e <- sample( LETTERS[1:4], 100 , replace=TRUE, prob=c(0.25, 0.25, 0.25, 0.25) )
f <- sample( LETTERS[1:5], 100 , replace=TRUE, prob=c(0.2, 0.2, 0.2, 0.2, 0.2) )
g <- sample( LETTERS[1:2], 100 , replace=TRUE, prob=c(0.5, 0.5) )

data = data.frame(a,b,c,d,e,f,g)
data$e = as.factor(data$e)
data$f = as.factor(data$f)
data$g = as.factor(data$g)

cols <- 1:4
data[cols] <- scale(data[cols])

#som model
som <- supersom(data= as.list(data), grid = somgrid(10,10, "hexagonal"), 
                dist.fct = "euclidean", keep.data = TRUE)

Everything works well - the problem is, when I try to apply the "supersom()" function on " more realistic and bigger data", I get the following error:

"Error: Non-informative layers present : mean distances between objects zero"

When I look at the source code for this function (https://rdrr.io/cran/kohonen/src/R/supersom.R), I notice a reference for the same error:

  if (any(sapply(meanDistances, mean) < .Machine$double.eps))
        stop("Non-informative layers present: mean distance between objects zero")
      

Can someone please show me how I might be able to resolve this error, i.e. make the "supersom()" function work with factor and numeric data?

I thought that perhaps removing duplicate rows and NA's might fix this problem:

data <- na.omit(data)
data <- unique(data)

However the same error ("Non-informative layers present : mean distances between objects zero") is still there.

Can someone please help me figure out what might be causing this error? Note: when I remove the "factor" variables, everything works fine.

Sources:

https://cran.r-project.org/web/packages/kohonen/kohonen.pdf

https://www.rdocumentation.org/packages/kohonen/versions/2.0.5/topics/supersom

https://rdrr.io/cran/kohonen/src/R/supersom.R

Upvotes: 1

Views: 365

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389155

The error happens if you have certain numeric columns whose mean is 0. You can reproduce the error by turning any 1 column to 0.

data$a <- 0
som <- supersom(data= as.list(data), grid = somgrid(10,10, "hexagonal"), 
                dist.fct = "euclidean", keep.data = TRUE)

Error in supersom(data = as.list(data), grid = somgrid(10, 10, "hexagonal"), : Non-informative layers present: mean distance between objects zero

Maybe you can investigate why those column have 0 mean or remove the columns with 0 means from the data.

library(kohonen)
library(dplyr)

data <- data %>% select(where(~(is.numeric(.) && mean(.) > 0) | !is.numeric(.)))
#som model
som <- supersom(data= as.list(data), grid = somgrid(10,10, "hexagonal"), 
                dist.fct = "euclidean", keep.data = TRUE)

Upvotes: 1

Related Questions