querying one R data frame by matching multiple columns from a second data frame

Question

Say I have a data set that describes the abundance of different species, at different sites, d1:

site <- c(1:5)
species1 <- c('A','A','B','C','A')
abundance1<- c(0.11,0.45,0.87,1.00,0.23)
species2 <- c('B','C','A','A','C')
abundance2 <- 1 - abundance1  
d1<- data.frame(site,species1,abundance1,species2,abundance2)

So, each site has two species, and there is an abundance column that describes the proportion of the total community each species represents.

I then have a second data set, d2, that describes some trait measurement of each species within a plot, for instance weight. So, species A in plot 1 may have a different observation of weight than species A in plot 2. The dataframe, d2, looks like this:

site<- c(1,1,2,2,3,3,4,4,5,5)
species <- c('A','B','A','C','B','A','C','A','A','C')
weight <- rnorm(10, 50,4)
d2<- data.frame(site,species,weight)

I would like to generate a column within d1 that is the abundance weighted average of weight, using the weight data in d2 such that each species within plot is assigned their unique observation of weight in the final calculation.

The expected output for the first entry of the new calculated vector would be the output of the function:

d1[1,3]*d2[1,3] + d1[1,5]*d2[2,3]

intra · Accepted Answer

Old school R. May be an easier way with other packages but this is straightforward apply.

d1$newvec <-    apply(d1, 1, function(x) 
                      d2[d2$site==x[1]&d2$species==x[2],'weight']*as.numeric(x[3]) + 
                      d2[d2$site==x[1]&d2$species==x[4],'weight']*as.numeric(x[5]))

querying one R data frame by matching multiple columns from a second data frame

Answers (1)

Related Questions