Norhther
Norhther

Reputation: 500

Network_plot 'names' attribute

I'm trying to make a network plot to show correlations. My code:

camas_desempleo %>% 
  select(-CCAA) %>%
  correlate() %>% 
  network_plot()

Data:

> dput(camas_desempleo)
structure(list(CCAA = c("andalucía", "cataluña", "comunitat valenciana", 
"madrid, comunidad de", "canarias", "castilla - la mancha", "galicia", 
"castilla y león", "país vasco", "extremadura", "murcia, región de", 
"asturias, principado de", "aragón", "balears, illes", "cantabria"
), paro = c(884121, 418438.25, 393648.25, 368107, 225404, 185089.75, 
175023.5, 151656.5, 125436, 109651.75, 106352, 76787.75, 71575.5, 
63432.75, 40508.5), pub = c(572, 511, 450, 479, 187, 155, 215, 
180, 158, 97, 112, 86, 113, 78, 40), priv = c(162, 141, 101, 
225, 50, 13, 48, 20, 21, 5, 11, 7, 22, 46, 0), total = c(734, 
652, 551, 704, 237, 168, 263, 200, 179, 102, 123, 93, 135, 124, 
40)), row.names = c(NA, -15L), class = c("tbl_df", "tbl", "data.frame"
))

The error given is the following:

Error in names(x) <- value : 'names' attribute [2] must be the same length as the vector [1] In addition: Warning message: In stats::cmdscale(abs(distance)) : only 1 of the first 2 eigenvalues are > 0

In addition: Warning message: In stats::cmdscale(abs(distance)) :
only 1 of the first 2 eigenvalues are > 0

I can't find any info in the documentation about this error, and there is a fairly similar approach using the same code for the mtcars dataset.

Upvotes: 2

Views: 145

Answers (1)

slava-kohut
slava-kohut

Reputation: 4233

The problem arises when network_plot (source code here) tries to execute the following step (lines 188-189):

points <- data.frame(stats::cmdscale(distance))
colnames(points) <-  c("x", "y")

The network_plot code assumes that it will deal with a data frame with 2 columns (k=2 by default in stats::cmdscale). But this is not necessarily true. From the reference (?stats::cmdscale):

A set of Euclidean distances on n points can be represented exactly in at most n - 1 dimensions. cmdscale follows the analysis of Mardia (1978), and returns the best-fitting k-dimensional representation, where k may be less than the argument k.

> stats::cmdscale(distance)
              [,1]
paro   0.118717157
pub    0.004476784
priv  -0.106540786
total -0.016653154
Warning message:
In stats::cmdscale(distance) : only 1 of the first 2 eigenvalues are > 0

In your case, you get only one column back because only one eigenvalue of the inner product matrix is positive. Can you find a way around it? No. This has to do with the nature of your input data, namely, the fact that you provided non-Euclidean distances. You can read up on this here.

Upvotes: 2

Related Questions