Paris St
Paris St

Reputation: 33

Random subsampling in R

I am new in R, therefore my question might be really simple. I have a 40 sites with abundances of zooplankton.

My data looks like this (columns are species abundances and rows are sites)

0   0   0   0   0   2   0   0   0   
0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   85  0
0   0   0   0   0   45  5   57  0
0   0   0   0   0   13  0   3   0
0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   7   0
0   3   0   0   12  8   0   57  0
0   0   0   0   0   0   0   1   0
0   0   0   0   0   59  0   0   0
0   0   0   0   4   0   0   0   0
0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0
0   105 0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0
0   0   0   0   1   0   0   100 0
0   35  0   55  0   0   0   0   0
1   4   0   0   0   0   0   0   0
0   0   0   0   0   34  21  0   0
0   0   0   0   0   9   17  0   0
0   54  0   0   0   27  5   0   0
0   1   0   0   0   1   0   0   0
0   17  0   0   0   54  3   0   0

What I would like to is take a random sub-sample (e.g. 50 individuals) from each site without replacement several times (bootstrap) in order to calculate diversity indexes to the new standardized abundances afterwards.

Upvotes: 2

Views: 16879

Answers (4)

RPK
RPK

Reputation: 11

Just stumbled upon this thread, and the vegan package has a function called 'rrarify' that does precisely what you're looking to do (and in the same ecological context, too)

Upvotes: 1

Dinre
Dinre

Reputation: 4216

What the OP is probably looking for here is a way to bootstrap the data for a Hill or Simpson diversity index, which provides some assumptions about the data being sampled:

  • Each row is a site, each column is a species, and each value is a count.
  • Individuals are being sampled for the bootstrap, NOT THE COUNTS.

To do this, bootstrapping programs will often model the counts as a string of individuals. For instance, if we had a record like so:

a  b  c
2  3  4

The record would be modeled as:

aabbbcccc

Then, a sample is usually drawn WITH replacement from the string to create a larger set based on the model set.

Bootstrapping a site: In R, we have a way to do this that is actually quite simple with the 'sample' function. If you select from the column numbers, you can provide probabilities using the count data.

# Test data.
data <- data.frame(a=2, b=3, c=4)

# Sampling from first row of data.
row <- 1
N_samples <- 50

samples <- sample(1:ncol(data), N_samples, rep=TRUE, prob=data[row,])

Converting the sample into the format of the original table: Now we have an array of samples, with each item indicating the column number that the sample belongs to. We can convert back to the original table format in multiple ways, but here is a fairly simple one using a simple counting loop:

# Count the number of each entry and store in a list.
for (i in 1:ncol(data)){
    site_sample[[i]] <- sum(samples==i)
}

# Unlist the data to get an array that represents the bootstrap row.
site_sample <- unlist(site_sample)

Upvotes: 1

thompsor
thompsor

Reputation: 147

This should work. It's a little more complicated than it looks at first, since each cell contains counts of a species. The solution uses the apply function to send each row of the data to the user-defined sample_species function. Then we generate n random numbers and order them. If there are 15 of species 1, 20 of species 2, and 20 of species 3, the random numbers generated between 1 and 15 signify species 1, 16 and 35 signify species 2, and 36-55 signify species 3.

## Initially takes in a row of the data and the number of samples to take
sample_species <- function(counts,n) {
  num_species <- length(counts)
  total_count <- sum(counts)
  samples <- sample(1:total_count,n,replace=FALSE)
  samples <- samples[order(samples)]
  result <- array(0,num_species)
  total <- 0
  for (i in 1:num_species) {
    result[i] <- length(which(samples > total & samples <= total+counts[i]))
    total <- total+counts[i]
  }
  return(result)
}

A <- matrix(sample(0:100,10*40,replace=T), ncol=10) ## mock data
B <- t(apply(A,1,sample_species,50)) ## results

Upvotes: 0

andreister
andreister

Reputation: 13871

Try something like this:

mysample <- mydata[sample(1:nrow(mydata), 50, replace=FALSE),]

Upvotes: 1

Related Questions