Kaiser
Kaiser

Reputation: 41

Why does metaMDS() produce a horizontal distribution of our data?

We have a species presence table (so binary: 1=present, 0=absent). When using metaMDS of the vegan package, it produces a horizontal distribution of our data when plotted, instead of clusters.

We tried using different distance methods (Euclidean, Bray, Jaccard), but they all seem to produce the same plot.

myfungi.all looks like this:

structure(list(Sample = 1:12, Habitat = structure(c(1L, 1L, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Dune", "Forest"
), class = "factor"), OTU88 = c(0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 
1L, 1L, 1L, 1L), OTU28 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L), OTU165 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L), OTU178 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 
0L), OTU97 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L
), OTU39 = c(0L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L), 
OTU104 = c(1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L
), OTU95 = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 
0L), OTU90 = c(1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L), OTU119 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L), OTU451 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 
0L), OTU98 = c(1L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L), OTU45 = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 
1L), OTU2 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 
1L), OTU24 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L), OTU169 = c(0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L), OTU29 = c(1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L), OTU85 = c(0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 
0L), OTU140 = c(1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 
0L), OTU42 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 
0L), OTU70 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 
0L), OTU25 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L), OTU34 = c(1L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 
1L), OTU181 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L), OTU201 = c(1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L), OTU17 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L), OTU1146 = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 
1L, 1L), OTU14 = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 1L, 
1L, 1L), OTU72 = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 
0L, 0L), OTU13 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 
1L, 1L), OTU20 = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 
1L, 1L), OTU63 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L), OTU170 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L), OTU262 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L), OTU48 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L), OTU6 = c(0L, 0L, 0L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 
0L, 0L), OTU3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 
1L, 1L), OTU31 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L), OTU73 = c(1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 
0L, 0L), OTU32 = c(0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 
0L, 0L), OTU37 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L), OTU196 = c(0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 
0L, 0L), OTU5 = c(1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 
0L, 0L), OTU11 = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 
0L, 1L), OTU16 = c(0L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 
0L, 0L), OTU41 = c(0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 
0L, 0L), OTU71 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L), OTU109 = c(0L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 
0L, 0L), OTU233 = c(0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L)), class = "data.frame", row.names = c(NA, -12L))

Our script looks like this:

myfungi.all = read.csv("soil_fungi.csv",header=T)
myfungi = myfungi.all[,c(3:51)]

myfungi.nmds.bc <- metaMDS(myfungi, distance = "bray", k = 2, binary = TRUE)
plot(myfungi.nmds.bc, type="t", main=paste("NMDS/Bray-Curtis -?? Stress =", round(myfungi.nmds.bc$stress,10)))

Does anyone have suggestions as what seems to be the problem?

At the moment our plot looks like this:

example of our NMDS plot

Upvotes: 4

Views: 6025

Answers (2)

Ben Bolker
Ben Bolker

Reputation: 226771

Best guess is that you simply don't have enough data to distinguish two separate environmental axes: when I run your code I get

Warning message: In metaMDS(myfungi[, -(1:2)], distance = "bray", k = 2, binary = TRUE) : stress is (nearly) zero: you may have insufficient data

Out of your 53 species, only 35 are informative (the others appear either at none or at all of the sites):

m2 <- myfungi[,apply(myfungi,2,var)>0]
ncol(m2) ## 35
vv <- function(x) (image(Matrix(as.matrix(x))))

How many distinct distribution patterns are there?

nrow(unique(t(m2)))  ## 27

You could try PCoA instead:

library(ape)
biplot(pcoa(vegdist(m2,"bray"))

enter image description here

As Jari Oksanen points out, you could also do this with cmdscale() in base R:

plot(cmdscale(vegdist(mm,"bray")),
     col=as.numeric(myfungi$Habitat))

Upvotes: 3

Jari Oksanen
Jari Oksanen

Reputation: 3702

The solution you reported gives a perfect fit (stress nearly 0), and also gives a warning because of this dubious stress. The solution effectively puts your sampling units into two points so that you have absolutely dichotomous data. As Ben Bolker demonstrated, Principal Coordinates Analysis, PCoA (which you also can perform with stats::cmdscale, vegan::wcmdscale or vegan::dbrda) still has points in two major cluster, but spreads points within these clusters. PCoA is a linear method, but NMDS is non-linear and therefore often needs more data. It seems that in this case the weak ties (read the documentation ?monoMDS or Kruskal's papers cited in that documentation) is the stage that puts most demand on the data, and setting weakties = FALSE will prevent collapsing non-identical observations into two points:

m3 <- metaMDS(myfungi, weakties = FALSE)
m3 # stress 0.04124
stressplot(m3) # compare this to your result stressplot(myfungi.nmds.bc)
plot(m3)

The default monoMDS with weakties = TRUE (like Kruskal recommended) will consider the dichotomy of two groups as the only important non-linear difference, but with weakties = FALSE the solutions cannot proceed to zero stress. You still have a dichotomy, but with scatter.

Upvotes: 6

Related Questions