Reputation: 3059
Is there an eloquent way to use ddply()
to obtain output for not only the most granular groups defined, but also the groups of those sub-groups?
In other words, when one of the classifiers is "any" or "either" or "doesn't matter". In the simple case of two grouping variables, this can be accomplished by a separate call to ddply
; however, when there are three or more classifiers that can all be set to "any" this gets messy due having to run ddply
over and over again for every new combination of "any"+others.
Reproducible example:
require(plyr)
## create a data frame with three classification variables
## and two numeric variables:
df1=data.frame(classifier1 = LETTERS[sample(2,200,replace=T)],
classifier2 = letters[sample(3,200,replace=T)],
classifier3 = rep(c("foo","bar"),100),
VAR1 = runif(200,50,250),
VAR2 = rnorm(200,85,20))
## apply an arbitrary function to subsets of df1; that is, all unique
## combinations of the three classifiers.
dlply(df1, .(classifier1,classifier2,classifier3),
function(df) lm(VAR1 ~ VAR2, data=df))
$A.a.bar
Call:
lm(formula = VAR1 ~ VAR2, data = df)
Coefficients:
(Intercept) VAR2
230.5555 -0.8591
$A.a.foo
Call:
lm(formula = VAR1 ~ VAR2, data = df)
Coefficients:
(Intercept) VAR2
128.3078 0.3631
...
Now, what if I want to get the same output for a few more groups when any/all classifiers are not included. For example, if I wanted to include when classifier1="any", I would only include classifier2 and classifier3 in the dlply
statement, like this:
dlply(df1, .(classifier2,classifier3), function(df) lm(VAR1 ~ VAR2, data=df))
If I then wanted to get output for when classifier2 and classifier3="any", I would again delete from the ddply
call and only include classifier1:
dlply(df1, .(classifier1), function(df) lm(VAR1 ~ VAR2, data=df))
However, this gets unwieldy when I have many more classifiers than three, and each classifier can be taken out (i.e. = "any") -- the number of combinations increases substantially. Is there an eloquent/fast way to obtain output for all the "groups of groups" of my data?
Upvotes: 3
Views: 566
Reputation: 115382
One approach would be to create a list of the combinations and then use Map
to create a list of the results of each dlply
call
You can use combn
in combination with lapply
and do.call('c',...)
to create a list of all the combinations of 1,2, ...,n variables
xx <- do.call('c',lapply(1:3, function(m) {
combn(x=names(df1)[1:3],m, simplify = FALSE)}))
You can then use this in a call to Map
(which is a wrapper for mapply(..., SIMPLIFY = FALSE)
results <- Map(f = function(x){dlply(df1,.var=x, .fun = lm, formula = VAR1 ~ VAR2)},xx)
Or you could just pass a function to combn
-- which will do the same thing
results <- do.call('c',lapply(1:3, function(m) {
combn(x=names(df1)[1:3],m, simplify = FALSE,
function(vv) {dlply(df1,.var=vv, .fun = lm, formula = VAR1~VAR2)})
}))
Upvotes: 4