Reputation: 500
I'm trying to make a network plot to show correlations. My code:
camas_desempleo %>%
select(-CCAA) %>%
correlate() %>%
network_plot()
Data:
> dput(camas_desempleo)
structure(list(CCAA = c("andalucía", "cataluña", "comunitat valenciana",
"madrid, comunidad de", "canarias", "castilla - la mancha", "galicia",
"castilla y león", "país vasco", "extremadura", "murcia, región de",
"asturias, principado de", "aragón", "balears, illes", "cantabria"
), paro = c(884121, 418438.25, 393648.25, 368107, 225404, 185089.75,
175023.5, 151656.5, 125436, 109651.75, 106352, 76787.75, 71575.5,
63432.75, 40508.5), pub = c(572, 511, 450, 479, 187, 155, 215,
180, 158, 97, 112, 86, 113, 78, 40), priv = c(162, 141, 101,
225, 50, 13, 48, 20, 21, 5, 11, 7, 22, 46, 0), total = c(734,
652, 551, 704, 237, 168, 263, 200, 179, 102, 123, 93, 135, 124,
40)), row.names = c(NA, -15L), class = c("tbl_df", "tbl", "data.frame"
))
The error given is the following:
Error in names(x) <- value : 'names' attribute [2] must be the same length as the vector [1] In addition: Warning message: In stats::cmdscale(abs(distance)) : only 1 of the first 2 eigenvalues are > 0
In addition: Warning message: In stats::cmdscale(abs(distance)) :
only 1 of the first 2 eigenvalues are > 0
I can't find any info in the documentation about this error, and there is a fairly similar approach using the same code for the mtcars
dataset.
Upvotes: 2
Views: 145
Reputation: 4233
The problem arises when network_plot
(source code here) tries to execute the following step (lines 188-189):
points <- data.frame(stats::cmdscale(distance))
colnames(points) <- c("x", "y")
The network_plot
code assumes that it will deal with a data frame with 2 columns (k=2
by default in stats::cmdscale
). But this is not necessarily true. From the reference (?stats::cmdscale
):
A set of Euclidean distances on n points can be represented exactly in at most n - 1 dimensions. cmdscale follows the analysis of Mardia (1978), and returns the best-fitting k-dimensional representation, where k may be less than the argument k.
> stats::cmdscale(distance)
[,1]
paro 0.118717157
pub 0.004476784
priv -0.106540786
total -0.016653154
Warning message:
In stats::cmdscale(distance) : only 1 of the first 2 eigenvalues are > 0
In your case, you get only one column back because only one eigenvalue of the inner product matrix is positive. Can you find a way around it? No. This has to do with the nature of your input data, namely, the fact that you provided non-Euclidean distances. You can read up on this here.
Upvotes: 2