flodel
flodel

Reputation: 89057

Split data frame, apply function, and return results in a nested list

My question's title almost matches the dlply (plyr package) description, except for the "nested" part.

Let me explain with an example:

library(plyr)
res <- dlply(mtcars, c("gear", "carb"), identity)
head(res, 2)
# $`3.1`
#                 mpg cyl  disp  hp drat    wt  qsec vs am gear carb
# Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
# Valiant        18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
# Toyota Corona  21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
# 
# $`3.2`
#                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
# Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
# Dodge Challenger  15.5   8  318 150 2.76 3.520 16.87  0  0    3    2
# AMC Javelin       15.2   8  304 150 3.15 3.435 17.30  0  0    3    2
# Pontiac Firebird  19.2   8  400 175 3.08 3.845 17.05  0  0    3    2

As you can see, the output is a list where the names (keys) are the concatenation of the two variables I used for splitting the data, e.g. "3.1" is the key for (gear = 3, carb = 1).

Instead, I would like my result to be a nested list so the elements can be accessed through two sets of keys, one for each of my splitting variables: res[["3"][["1"]].

Is there something around, not necessarily from the plyr package, that can achieve this? I'd like the answer to be generalizable to any number of splitting variables. Also, it is important that I can apply any function although my example used the identity function, resulting in a mere split of the data. Thank you for your suggestions.

Upvotes: 1

Views: 1783

Answers (2)

flodel
flodel

Reputation: 89057

I came with a solution myself, it uses recursion:

nested.dlply <- function(df, by, fun, ...) {

   require(plyr)

   if (length(by) == 1) {
      dlply(df, by, fun, ...)
   } else {
      dlply(df, by[1], nested.dlply, by[-1], fun, ...)
   }
}

Here are a couple examples:

nested.dlply(mtcars, c("gear", "carb"), identity)
# $`3`
# $`3`$`1`
#                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
# Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
# Valiant        18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
# Toyota Corona  21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
# 
# $`3`$`2`
#                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
# Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
# Dodge Challenger  15.5   8  318 150 2.76 3.520 16.87  0  0    3    2
# AMC Javelin       15.2   8  304 150 3.15 3.435 17.30  0  0    3    2
# Pontiac Firebird  19.2   8  400 175 3.08 3.845 17.05  0  0    3    2
# [...]

nested.dlply(mtcars, c("gear", "carb"), head, 2)
# $`3`
# $`3`$`1`
#                 mpg cyl disp  hp drat    wt  qsec vs am gear carb
# Hornet 4 Drive 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
# Valiant        18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
# 
# $`3`$`2`
#                    mpg cyl disp  hp drat   wt  qsec vs am gear carb
# Hornet Sportabout 18.7   8  360 175 3.15 3.44 17.02  0  0    3    2
# Dodge Challenger  15.5   8  318 150 2.76 3.52 16.87  0  0    3    2
# [...]

I doubt this is very efficient but it does the job. I still welcome your suggestions. Ideally I was hoping some package already implemented it.

Upvotes: 3

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193517

What about nesting split?

temp = lapply(split(mtcars, mtcars$gear), function(x) split(x, x$carb))
temp[["3"]]["1"]
# $`1`
#                 mpg cyl  disp  hp drat    wt  qsec vs am gear carb
# Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
# Valiant        18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
# Toyota Corona  21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1

Upvotes: 2

Related Questions