Rich Scriven
Rich Scriven

Reputation: 99331

Function list over a list of data frames without nesting apply()

With the data dat below, I'm trying to achieve the following result, only without nesting
lapply(, sapply...), like the following does.

> lapply(dat, function(x) sapply(funs, function(y) y(x)))
# $bondsba01
#   AVG   SLG 
# 0.223 0.300 
#
# $pujolal01
#   AVG   SLG 
# 0.329 0.422 

I'm familiar with rapply(), but I'm having trouble implementing it over this list. I figured since dat is a list of data frames, this call amounts to a list of lists, and rapply is appropriate.

I've tried a few variations of rapply(), and get the same error almost every time.

> rapply(funs, function(x) x(dat), how = "replace")
#  Error in eval(expr, envir, enclos) : object 'H' not found 

I get the same error when how = "list" and how = "unlist" How can I do this without nesting sapply with lapply?

Sample Data:

dat <- 
structure(list(bondsba01 = structure(list(AB = 413L, R = 72L, 
    H = 92L, X2B = 26L, X3B = 3L, HR = 16L, RBI = 48L, SB = 36L, 
    CS = 7L, BB = 65L, SO = 102L, IBB = 2L, HBP = 2L, SH = 2L, 
    SF = 2L), .Names = c("AB", "R", "H", "X2B", "X3B", "HR", 
"RBI", "SB", "CS", "BB", "SO", "IBB", "HBP", "SH", "SF"), row.names = 1L, 
    class = "data.frame"), 
    pujolal01 = structure(list(AB = 590L, R = 112L, H = 194L, 
        X2B = 47L, X3B = 4L, HR = 37L, RBI = 130L, SB = 1L, CS = 3L, 
        BB = 69L, SO = 93L, IBB = 6L, HBP = 9L, SH = 1L, SF = 7L), 
    .Names = c("AB", "R", "H", "X2B", "X3B", "HR", "RBI", "SB", "CS", "BB",
    "SO", "IBB", "HBP", "SH", "SF"), row.names = 1L, class = "data.frame")),
    .Names = c("bondsba01", "pujolal01"))

Function List:

funs <- 
structure(list(AVG = function (x) 
with(x, round(H/AB, 3)), SLG = function (x) 
with(x, round(((H - X2B - X3B - HR) + 2 * X2B + 3 * X3B + HR)/AB, 
    3))), .Names = c("AVG", "SLG"))

Link to the actual data.

Upvotes: 0

Views: 137

Answers (1)

Andrie
Andrie

Reputation: 179408

Just because it's Saturday morning, and I'm in the mood to experiment with foreach, here is a solution:

library(foreach)
library(iterators)

foreach(x=iter(dat), .combine=cbind) %:% 
  foreach(f=iter(funs), .combine=c)  %do% 
  f(x)


     result.1 result.2
[1,]    0.223    0.329
[2,]    0.300    0.422

This should be fast, but more importantly, it is pretty easy to parallelise foreach. You only have to make two changes:

  • Load your preferred parallel package (I use doParallel) and register the cluster
  • Change %do% to %dopar%

Like this:

library(doParallel)
cl <- makePSOCKcluster(2)
registerDoParallel(cl)
foreach(x=iter(dat), .combine=cbind) %:% 
  foreach(f=iter(funs), .combine=c)  %dopar% 
  f(x)

     result.1 result.2
[1,]    0.223    0.329
[2,]    0.300    0.422

stopCluster(cl)

Upvotes: 3

Related Questions