Reputation: 89057
My question's title almost matches the dlply
(plyr
package) description, except for the "nested" part.
Let me explain with an example:
library(plyr)
res <- dlply(mtcars, c("gear", "carb"), identity)
head(res, 2)
# $`3.1`
# mpg cyl disp hp drat wt qsec vs am gear carb
# Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
# Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
# Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
#
# $`3.2`
# mpg cyl disp hp drat wt qsec vs am gear carb
# Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
# Dodge Challenger 15.5 8 318 150 2.76 3.520 16.87 0 0 3 2
# AMC Javelin 15.2 8 304 150 3.15 3.435 17.30 0 0 3 2
# Pontiac Firebird 19.2 8 400 175 3.08 3.845 17.05 0 0 3 2
As you can see, the output is a list where the names (keys) are the concatenation of the two variables I used for splitting the data, e.g. "3.1"
is the key for (gear = 3, carb = 1)
.
Instead, I would like my result to be a nested list so the elements can be accessed through two sets of keys, one for each of my splitting variables: res[["3"][["1"]]
.
Is there something around, not necessarily from the plyr
package, that can achieve this? I'd like the answer to be generalizable to any number of splitting variables. Also, it is important that I can apply any function although my example used the identity
function, resulting in a mere split of the data. Thank you for your suggestions.
Upvotes: 1
Views: 1783
Reputation: 89057
I came with a solution myself, it uses recursion:
nested.dlply <- function(df, by, fun, ...) {
require(plyr)
if (length(by) == 1) {
dlply(df, by, fun, ...)
} else {
dlply(df, by[1], nested.dlply, by[-1], fun, ...)
}
}
Here are a couple examples:
nested.dlply(mtcars, c("gear", "carb"), identity)
# $`3`
# $`3`$`1`
# mpg cyl disp hp drat wt qsec vs am gear carb
# Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
# Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
# Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
#
# $`3`$`2`
# mpg cyl disp hp drat wt qsec vs am gear carb
# Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
# Dodge Challenger 15.5 8 318 150 2.76 3.520 16.87 0 0 3 2
# AMC Javelin 15.2 8 304 150 3.15 3.435 17.30 0 0 3 2
# Pontiac Firebird 19.2 8 400 175 3.08 3.845 17.05 0 0 3 2
# [...]
nested.dlply(mtcars, c("gear", "carb"), head, 2)
# $`3`
# $`3`$`1`
# mpg cyl disp hp drat wt qsec vs am gear carb
# Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
# Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
#
# $`3`$`2`
# mpg cyl disp hp drat wt qsec vs am gear carb
# Hornet Sportabout 18.7 8 360 175 3.15 3.44 17.02 0 0 3 2
# Dodge Challenger 15.5 8 318 150 2.76 3.52 16.87 0 0 3 2
# [...]
I doubt this is very efficient but it does the job. I still welcome your suggestions. Ideally I was hoping some package already implemented it.
Upvotes: 3
Reputation: 193517
What about nesting split
?
temp = lapply(split(mtcars, mtcars$gear), function(x) split(x, x$carb))
temp[["3"]]["1"]
# $`1`
# mpg cyl disp hp drat wt qsec vs am gear carb
# Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
# Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
# Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
Upvotes: 2