Reputation: 21
I'm having trouble accurately defining my permutation design/hierarchy in the "permute" package in R.
Given a hypothetical set of plots, in which I've recorded species occurrences, I'd like to shuffle species within plots while maintaining the number of species in each plot, and also maintaining the overall abundance of individual species across the entire species pool.
Ultimately I'm trying to build a null distribution that is constrained at the plot level (n species per plot), and also at the overall species pool level (total observations of each species).
# build dataset representing the presence/absence of 10 species (columns)
# in 100 plots (rows)
set.seed(123)
dat = matrix(
sample(c(0,1), size = 100*10, replace = T, prob = c(0.75, 0.25)),
nrow = 100,
ncol = 10) # let this matrix represent the observed data
rowSums(dat) # represents the number of species present in each plot
colSums(dat) # represents the overall number of observations of each species
relative_abund = colSums(dat) / sum(dat)
# proportion of occurrences of each species in the entire species pool
# use "permute" package to shuffle species in plots
# while maintaining the total number of species in each plot
# and the relative abundance of all species in the species pool
library(permute)
# single permutation of "plot # 1" maintaining number of species per plot
dat[1, shuffle(dat[1,])]
# single permutation maintaining total observations of "species # 1"
dat[shuffle(dat[,1]), 1 ]
# use permutation design/control to shuffle data, such that
rowSums(permuted_dat) == rowSums(dat)
colSums(permuted_dat) == colSums(dat) # at least approximately
Upvotes: 1
Views: 163
Reputation: 21
Thanks - Gavin Simpson. The vegan package works great for this.
Here's an example for those wishing to preserve both row and column frequencies in the resulting null models. The ?commsim help file in "vegan" details different algorithms for specifying null models.
# load library
library(vegan)
# build hypothetical dataset of presence/absence data,
# where rows are plots and columns are species
dat = matrix(
sample(c(0,1), size = 100*10, replace = T, prob = c(0.75, 0.25)),
nrow = 100,
ncol = 10)
# build null models by reshuffling the dataset
nm_qs = nullmodel(dat, "quasiswap")
nm_qs$data # show presence/absence if original data were abundances
# 100 iterations/reshuffles
# using the "quasiswap" non-sequential algorithm for binary data
sm_qs = simulate(nm_qs, nsim = 100)
# confirm reshuffling preserved row frequencies (i.e., # of species per plot)
rowSums(nm_qs$data) == rowSums(sm_qs[1:100, 1:10, 1]) # first reshuffle
rowSums(nm_qs$data) == rowSums(sm_qs[1:100, 1:10, 2]) # second reshuffle
# confirm reshuffling preserved column frequencies (i.e., overall species abundances)
colSums(nm_qs$data) == colSums(sm_qs[1:100, 1:10, 1]) # first reshuffle
colSums(nm_qs$data) == colSums(sm_qs[1:100, 1:10, 1]) # second reshuffle
# view single reshuffled community
sm_qs[1:100, 1:10, 1]
# build distribution from null models as needed...
Upvotes: 1