Reputation: 3992
I'm trying to compare results I get from a PCA biplot with what I can get from MASS:isoMDS()
and vegan::metaMDS()
particularly using vectors representing the variables to help interpret the MDS dimensions.
My question is essentially how I can understand what is returned by the $species
component of metaMDS()
. How is it calculated and what do the values represent?
The terminology differs: sites x species seems to correspond to cases x variables.
For the heplots::Diabetes
data, the PCA is given by
data(Diabetes, package="heplots")
diab.pca <-
Diabetes |>
dplyr::select(where(is.numeric)) |>
prcomp(scale. = TRUE)
and a fancy biplot (using the ggbiplot
package) is shown below. The variable vectors are essentially the correlations of the observed variables with the dimensions.
Similarly, with MASS::isoMDS()
I can plot the solution, and then embed variable vectors, by explicitly calculating those correlations.
diab.dist <- dist(Diabetes[, 1:5])
mds <- diab.dist |>
MASS::isoMDS(k = 2, trace = FALSE) |>
purrr::pluck("points")
colnames(mds) <- c("Dim1", "Dim2")
library(ggpubr)
cols <- scales::hue_pal()(3) |> rev()
mplot <-
ggscatter(mds, x = "Dim1", y = "Dim2",
color = "group",
shape = "group",
palette = cols,
size = 2,
ellipse = TRUE,
ellipse.level = 0.5,
ellipse.type = "t") +
geom_hline(yintercept = 0, color = "gray") +
geom_vline(xintercept = 0, color = "gray")
vectors <- cor(Diabetes[, 1:5], mds[, 1:2])
scale_fac <- 500
mplot +
coord_fixed() +
geom_segment(data=vectors,
aes(x=0, xend=scale_fac*Dim1, y=0, yend=scale_fac*Dim2),
arrow = arrow(length = unit(0.2, "cm"), type = "closed"),
linewidth = 1.1) +
geom_text(data = vectors,
aes(x = 1.15*scale_fac*Dim1, y = 1.07*scale_fac*Dim2,
label=row.names(vectors)),
nudge_x = 4,
size = 4) +
theme(legend.position = "inside",
legend.position.inside = c(.8, .8))
With vegan::metaMDS()
, I thought that the species
component would give me something comparable t0 the variable vectors, and it mostly does, except for the vector for relwt
.
Here's what I do to try to create a metaMDS()
version.
library(vegan)
diab.mds <- metaMDS(diab.dist, distance="euclidean", k=2, trace = 0)
# get coordinates of points & vectors
points <- data.frame(diab.mds$points, group = Diabetes$group)
vectors <- data.frame(diab.mds$species)
cols <- scales::hue_pal()(3) |> rev()
ggplot(data=points, aes(x = MDS1, y = MDS2,
color = group, shape = group)) +
geom_point(size = 2) +
stat_ellipse(level = 0.68, linewidth=1.1) +
geom_hline(yintercept = 0) +
geom_vline(xintercept = 0) +
scale_color_manual(values = cols) +
geom_segment(data = vectors, aes(x=0, xend=MDS1, y=0, yend=MDS2,),
arrow = arrow(length = unit(0.2, "cm"), type = "closed"),
colour="black",
linewidth = 1.2,
inherit.aes = FALSE) +
geom_text(data=vectors, aes(x=MDS1, y=MDS2, label=rownames(vectors)),
size=5,
vjust = "outward",
inherit.aes = FALSE) +
coord_cartesian(clip = "off") +
theme_bw(base_size = 14) +
theme(legend.position = "inside",
legend.position.inside = c(.8, .8))
What I get from this is below. As you can see, the configuration of points & ellipses are similar
across these methods. The variable vectors are also similar, except that the one for relwt
points in the opposite direction from that in the PCA & isoMDS
Edit
My problem arose from not understanding how to translate vegan terminology of sites and species into what I'm familiar with, a data set of observations x variables. The solution here is to use envfit()
. This gives me what I want.
(fit <- envfit(diab.mds, Diabetes[, 1:5]))
plot(diab.mds)
plot(fit)
points(points[, 1:2], col = Diabetes$group)
car::dataEllipse(MDS2 ~ MDS1 | group, data=points,
col = cols,
levels = 0.5, add=TRUE)
Upvotes: 0
Views: 221