jessi
jessi

Reputation: 1518

Multiply column values in one data.frame by column in another data.frame on a condition in R

I have two data frames in r that I am trying to combine based on the values in a column for each.

df1=data.frame(comp=c("comp1", "comp2", "comp3","comp1"),
 state1=c(1,0,0,1),
 state2=c(1,1,0,1),
 state3=c(0,1,1,0),
 state4=c(0,0,1,0),year=c(1,1,1,2))

   comp state1 state2 state3 state4 year
1 comp1      1      1      0      0    1
2 comp2      0      1      1      0    1
3 comp3      0      0      1      1    1
4 comp1      1      1      0      0    2

df2=data.frame(state=c("state1","state2", "state3", "state4", 
                       "state1","state2", "state3", "state4"), 
 var1=c(1,0,0,1,0,0,1,1), 
 var2=c(0,1,0,0,0,1,1,0), 
 year=c(1,1,1,1,2,2,2,2))

df2

    state var1 var2 year
1 state1    1    0    1
2 state2    0    1    1
3 state3    0    0    1
4 state4    1    0    1
5 state1    0    1    2
6 state2    0    1    2
7 state3    1    1    2
8 state4    1    0    2

I'd like to append columns to df1 that are var1, var2 which is the mean of all states for that comp.

so, var1 for comp1 should be 1*1+1*0+0*0+0*1/(1+1) or state*var/sum(state for comp) by year.

df3 would look like:

         state1 state2 state3 state4 year  var1  var2
    1 comp1  1      1      0      0     1   0.5   0.5
    2 comp2  0      1      1      0     1   0.0   0.5
    3 comp3  0      0      1      1     1   0.5   0.0
    4 comp1  1      1      0      0     2   0.5   1.0

Is this possible? I tried to use ddply with mean of var1, summarizing by comp and year, but that doesn't work. I end up with more than one row per comp per year.

Thanks in advance. This one is the most similar to my problem, but it doesn't show a conditional in the second data set. Multiply various subsets of a data frame by different vectors

Please advise.

Upvotes: 0

Views: 1537

Answers (1)

IRTFM
IRTFM

Reputation: 263342

My hope is that by breaking this into segments you can find out why my results look different than your prediction:

 df3 <- matrix(NA, ncol=2, nrow=nrow(df1))
 for (i in seq(nrow(df1))) {
     df3[i, 1] <- sum(df2[ df2$year==df1$year[i], "var1"] * df1[i, 2:5])
     df3[i, 2] <- sum(df2[ df2$year==df1$year[i], "var2"] * df1[i, 2:5])
 }
 m4<-df3/rowSums(df1[2:5])
 cbind(df1, m4)
#---------------
   comp state1 state2 state3 state4 year   1         2
1 comp1      1      1      0      0    1 0.5 0.5000000
2 comp2      0      1      1      0    1 0.0 0.3333333
3 comp3      0      0      1      1    1 0.5 0.0000000
4 comp1      1      1      0      0    2 0.0 0.3333333

Seems to match up ok on "var1" entries and I'm hoping you just threw in some guesses for "var2".

Upvotes: 1

Related Questions