Logit
Logit

Reputation: 591

Using apply() with a custom function and a second data frame

I'm trying to generate predicted values from a large number of model simulations and I'm having a hard time doing it simply. I suspect I need something from the apply() family, but I can't figure out the syntax. Maybe my knowledge of apply() is weak. Or maybe my function is wrong. Any suggestions?

Suppose I've got following coefficients resulting from six model simulations...

coef <- data.frame(intercept=c(2,3,5,7,2,1),
                   b1 = c(.2,.5,.6,.7,.9,.4),
                   b2 = c(10,11,12,11,9,10))

I want to compute (predicted values or) the linear combination of each row above and each row of the following data frame...

df <- data.frame(age = c(50,20,19, 42),
                 height = c(60,72,79, 66))

...Using the following model equation:

coef$intercept + coef$b1*df$age + coef$b2*df$height

Done right, I should get the following 24 data values:

612.0   726.0   795.8   670.4
688.0   805.0   881.5   750
755.0   881.0   964.4   822.2
702.0   813.0   889.3   762.4
587.0   668.0   730.1   633.8
621.0   729.0   798.6   677.8

To get the above, I've tried the following function and use of apply()...

equation <-  function(...) coef$intercept + coef$b1*df$age + coef$b2*df$height
result <- apply(df, 1, equation)

...but I don't get the correct answer. The "result" data frame just repeats the correct diagonals. I also get the message:

> Warning messages: 1: In coef$b1 * df$age :   longer object length is
> not a multiple of shorter object length

Yes I can get the correct answer through simple matrix multiplication:

df$ones <- 1
df <- df[,c(3, 1, 2)]
result <- as.matrix(coef) %*% t(as.matrix(df))

But it seems to me one ought to be able to do this more generally using apply() and a custom function. Use of apply() is more compact and puts me less at risk of having my matrix columns in the wrong order. Any suggestions?

Upvotes: 0

Views: 1434

Answers (3)

JdeMello
JdeMello

Reputation: 1718

Here is what I'd do:

sapply(seq_along(1:nrow(coef)), function(x){

  sapply(seq_along(1:nrow(df)), function(y) {
    coef$intercept[[x]] + coef$b1[[x]]*df$age[[y]] + coef$b2[[x]]*df$height[[y]]
  })

})

Result:

     [,1]  [,2]  [,3]  [,4]  [,5]  [,6]
[1,] 612.0 688.0 755.0 702.0 587.0 621.0
[2,] 726.0 805.0 881.0 813.0 668.0 729.0
[3,] 795.8 881.5 964.4 889.3 730.1 798.6
[4,] 670.4 750.0 822.2 762.4 633.8 677.8

Use two sapplys. One for each object (df and coef).

Upvotes: 1

akrun
akrun

Reputation: 887223

We can do this with %*%

coef[,1] + as.matrix(coef[-1]) %*% t(df)
#     [,1] [,2]  [,3]  [,4]
#[1,]  612  726 795.8 670.4
#[2,]  688  805 881.5 750.0
#[3,]  755  881 964.4 822.2
#[4,]  702  813 889.3 762.4
#[5,]  587  668 730.1 633.8
#[6,]  621  729 798.6 677.8

Upvotes: 3

Yannis Vassiliadis
Yannis Vassiliadis

Reputation: 1709

If you really want to use apply, you can do this:

result<- t(apply(coef, 1, function(x) x[1] + x[2]*df$age + x[3]*df$height))
> result
     [,1] [,2]  [,3]  [,4]
[1,]  612  726 795.8 670.4
[2,]  688  805 881.5 750.0
[3,]  755  881 964.4 822.2
[4,]  702  813 889.3 762.4
[5,]  587  668 730.1 633.8
[6,]  621  729 798.6 677.8

But it's really preferable (and faster) to do the matrix multiplication.

Upvotes: 3

Related Questions