colin
colin

Reputation: 2666

querying one R data frame by matching multiple columns from a second data frame

Say I have a data set that describes the abundance of different species, at different sites, d1:

site <- c(1:5)
species1 <- c('A','A','B','C','A')
abundance1<- c(0.11,0.45,0.87,1.00,0.23)
species2 <- c('B','C','A','A','C')
abundance2 <- 1 - abundance1  
d1<- data.frame(site,species1,abundance1,species2,abundance2)

So, each site has two species, and there is an abundance column that describes the proportion of the total community each species represents.

I then have a second data set, d2, that describes some trait measurement of each species within a plot, for instance weight. So, species A in plot 1 may have a different observation of weight than species A in plot 2. The dataframe, d2, looks like this:

site<- c(1,1,2,2,3,3,4,4,5,5)
species <- c('A','B','A','C','B','A','C','A','A','C')
weight <- rnorm(10, 50,4)
d2<- data.frame(site,species,weight)

I would like to generate a column within d1 that is the abundance weighted average of weight, using the weight data in d2 such that each species within plot is assigned their unique observation of weight in the final calculation.

The expected output for the first entry of the new calculated vector would be the output of the function:

d1[1,3]*d2[1,3] + d1[1,5]*d2[2,3]

Upvotes: 0

Views: 60

Answers (1)

intra
intra

Reputation: 376

Old school R. May be an easier way with other packages but this is straightforward apply.

d1$newvec <-    apply(d1, 1, function(x) 
                      d2[d2$site==x[1]&d2$species==x[2],'weight']*as.numeric(x[3]) + 
                      d2[d2$site==x[1]&d2$species==x[4],'weight']*as.numeric(x[5]))

Upvotes: 1

Related Questions