Reputation: 67
I have community data collected by two sampling methods that I want to transform into a matrix (or two? not sure which would be correct input) for a downstream analysis using the vegan package to compare how well each method performs at detecting community dissimilarity (using bray-curtis and anosim).
Here are two example dataframes:
method1 <- data.frame(site = c('site1','site1','site1','site1','site2','site2','site2','site2','site3','site3','site3','site3'),
sampleID = c("site1.net1.2018", "site1.net2.2018","site1.net1.2019", "site1.net2.2019", "site2.net1.2018", "site2.net2.2018","site2.net1.2019","site2.net2.2019","site3.net1.2018", "site3.net2.2018", "site3.net1.2019", "site3.net2.2019"),
year = c("2018", "2018", "2019", "2019","2018", "2018", "2019", "2019","2018", "2018", "2019", "2019"),
species = c('Sp1','Sp2','Sp1','Sp3','Sp4','Sp2','Sp1','Sp2','Sp1','Sp3','Sp4','Sp2'),
abundance = c(1,7,1,6,2,5,2,1,6,3,2,1),
method = c("method1","method1","method1","method1","method1","method1","method1","method1","method1","method1","method1","method1"))
method2 <- data.frame(site = c('site1','site1','site1','site1','site2','site2','site2','site2','site3','site3','site3','site3'),
sampleID = c("site1.net1.2018", "site1.net2.2018","site1.net1.2019", "site1.net2.2019", "site2.net1.2018", "site2.net2.2018","site2.net1.2019","site2.net2.2019","site3.net1.2018", "site3.net2.2018", "site3.net1.2019", "site3.net2.2019"),
year = c("2018", "2018", "2019", "2019","2018", "2018", "2019", "2019","2018", "2018", "2019", "2019"),
species = c('Sp2','Sp4','Sp5','Sp1','Sp3','Sp1','Sp6','Sp1','Sp3','Sp4','Sp1','Sp5'),
abundance = c(2,1,3,3,5,2,10,6,4,2,1,1),
method = c("method2","method2","method2","method2","method2","method2","method2","method2","method2","method2","method2","method2"))
> head(method1)
site sampleID year species abundance method
1 site1 site1.net1.2018 2018 Sp1 1 method1
2 site1 site1.net2.2018 2018 Sp2 7 method1
3 site1 site1.net1.2019 2019 Sp1 1 method1
4 site1 site1.net2.2019 2019 Sp3 6 method1
5 site2 site2.net1.2018 2018 Sp4 2 method1
6 site2 site2.net2.2018 2018 Sp2 5 method1
It's unclear to me how the data should be formatted in matrix form as input into the vegan package, especially since there are multiple years, samples, and methods. For example, the documentation for vegan shows the following that indicates a separate df is to be used for categorical/environmental variables:
data(dune)
data(dune.env)
dune.dist <- vegdist(dune)
attach(dune.env)
dune.ano <- anosim(dune.dist, Management)
This example has one community matrix for multiple management types, but it's unclear to me whether i need to make one matrix or two matrices for each sampling method, and how to coalesce the data into a binary presence/absence matrix formatted by method, year, and sampleID.
Upvotes: 0
Views: 187
Reputation: 1793
The dune
and dune.env
data.frames do a pretty good job of illustrating the data structure that you need.
You want a community matrix, equivalent to dune
, with sites in rows and species in columns. So something like this:
Species_1 Species_2 Species_3 Species_4
1 7 0 0
1 0 6 0
0 5 0 2
You would want separate rows for each site/year/method. (I can't really understand what the sampleID
column in your data means, so the above table may be wrong).
You also want a separate data.frame of independent variables, equivalent to dune.env
, explaining the characteristics of each row in your community matrix (note that dune
and dune.env
have the same number of rows). So something like this:
Site Year Method
1 2018 method1
1 2019 method1
2 2018 method1
etc...
You can then plan your analysis. You could easily use a function like adonis
to test whether there are differences between the communities detected using method1
and method2
, while accounting for Site
and Year
. However, you say you want to investigate "how well each method performs at detecting community dissimilarity" - do you have known communities that you're trying to detect? The exact analysis will depend on your aim.
Upvotes: 0