Fully Aquatic
Fully Aquatic

Reputation: 67

Transform community data collected by two sampling methods into matrix for vegan

I have community data collected by two sampling methods that I want to transform into a matrix (or two? not sure which would be correct input) for a downstream analysis using the vegan package to compare how well each method performs at detecting community dissimilarity (using bray-curtis and anosim).

Here are two example dataframes:

method1 <- data.frame(site = c('site1','site1','site1','site1','site2','site2','site2','site2','site3','site3','site3','site3'),
                    sampleID  = c("site1.net1.2018", "site1.net2.2018","site1.net1.2019", "site1.net2.2019", "site2.net1.2018", "site2.net2.2018","site2.net1.2019","site2.net2.2019","site3.net1.2018", "site3.net2.2018", "site3.net1.2019", "site3.net2.2019"),
                    year = c("2018", "2018", "2019", "2019","2018", "2018", "2019", "2019","2018", "2018", "2019", "2019"),
                    species = c('Sp1','Sp2','Sp1','Sp3','Sp4','Sp2','Sp1','Sp2','Sp1','Sp3','Sp4','Sp2'),
                    abundance = c(1,7,1,6,2,5,2,1,6,3,2,1),
                    method = c("method1","method1","method1","method1","method1","method1","method1","method1","method1","method1","method1","method1"))
                    
method2 <- data.frame(site = c('site1','site1','site1','site1','site2','site2','site2','site2','site3','site3','site3','site3'),
                      sampleID  = c("site1.net1.2018", "site1.net2.2018","site1.net1.2019", "site1.net2.2019", "site2.net1.2018", "site2.net2.2018","site2.net1.2019","site2.net2.2019","site3.net1.2018", "site3.net2.2018", "site3.net1.2019", "site3.net2.2019"),
                      year = c("2018", "2018", "2019", "2019","2018", "2018", "2019", "2019","2018", "2018", "2019", "2019"),
                      species = c('Sp2','Sp4','Sp5','Sp1','Sp3','Sp1','Sp6','Sp1','Sp3','Sp4','Sp1','Sp5'),
                      abundance = c(2,1,3,3,5,2,10,6,4,2,1,1),
                      method = c("method2","method2","method2","method2","method2","method2","method2","method2","method2","method2","method2","method2"))
> head(method1)
   site        sampleID year species abundance  method
1 site1 site1.net1.2018 2018     Sp1         1 method1
2 site1 site1.net2.2018 2018     Sp2         7 method1
3 site1 site1.net1.2019 2019     Sp1         1 method1
4 site1 site1.net2.2019 2019     Sp3         6 method1
5 site2 site2.net1.2018 2018     Sp4         2 method1
6 site2 site2.net2.2018 2018     Sp2         5 method1

It's unclear to me how the data should be formatted in matrix form as input into the vegan package, especially since there are multiple years, samples, and methods. For example, the documentation for vegan shows the following that indicates a separate df is to be used for categorical/environmental variables:

data(dune)
data(dune.env)
dune.dist <- vegdist(dune)
attach(dune.env)
dune.ano <- anosim(dune.dist, Management)

This example has one community matrix for multiple management types, but it's unclear to me whether i need to make one matrix or two matrices for each sampling method, and how to coalesce the data into a binary presence/absence matrix formatted by method, year, and sampleID.

Upvotes: 0

Views: 187

Answers (1)

rw2
rw2

Reputation: 1793

The dune and dune.env data.frames do a pretty good job of illustrating the data structure that you need.

You want a community matrix, equivalent to dune, with sites in rows and species in columns. So something like this:

Species_1    Species_2    Species_3    Species_4
        1            7            0            0
        1            0            6            0
        0            5            0            2

You would want separate rows for each site/year/method. (I can't really understand what the sampleID column in your data means, so the above table may be wrong).

You also want a separate data.frame of independent variables, equivalent to dune.env, explaining the characteristics of each row in your community matrix (note that dune and dune.env have the same number of rows). So something like this:

Site    Year    Method
   1    2018   method1
   1    2019   method1
   2    2018   method1

etc...

You can then plan your analysis. You could easily use a function like adonis to test whether there are differences between the communities detected using method1 and method2, while accounting for Site and Year. However, you say you want to investigate "how well each method performs at detecting community dissimilarity" - do you have known communities that you're trying to detect? The exact analysis will depend on your aim.

Upvotes: 0

Related Questions