Reputation: 2990
I am trying to group and summarise with a function, using the new underscore functions for standard evaluation provided in dplyr 0.3. However, I ran into an issue when trying to use lapply instead of a loop:
Small example
fruits <- c("APPLE", "PEAR", "BANANA")
makes <- c("HONDA", "FERRARI", "TESLA")
df <- data.frame(fruit = sample(fruits, 100, replace = T),
make = sample(makes, 100, replace = T),
value = 1:100)
cols <- c("fruit", "make")
showTopTenFactors <- function(x, ...) x %>%
group_by_(...) %>%
summarise(cnt = n()) %>%
arrange(desc(cnt)) %>%
head(10)
Now this loop gives me the desired output
for(i in cols){
showTopTenFactors(df, i) %>% print
}
Source: local data frame [3 x 2]
fruit cnt
1 APPLE 49
2 BANANA 30
3 PEAR 21
Source: local data frame [3 x 2]
make cnt
1 HONDA 35
2 TESLA 34
3 FERRARI 31
But when I try to substitute it with
lapply(cols, showTopTenFactors, df)
I get the following error message:
Error in UseMethod("group_by_") :
no applicable method for 'group_by_' applied to an object of class "character"
Upvotes: 0
Views: 3344
Reputation: 263362
I don't think you actually need to create an anonymous function. lapply
should be able to pass along an argument as long as it is named correctly:
> lapply(cols, showTopTenFactors, x=df)
[[1]]
Source: local data frame [3 x 2]
fruit cnt
1 BANANA 41
2 APPLE 32
3 PEAR 27
[[2]]
Source: local data frame [3 x 2]
make cnt
1 FERRARI 45
2 TESLA 30
3 HONDA 25
You were letting the 'cols' values get matched to the x in your function. This is not specific to dplyr-based functions but is rather a generic R issue.
Upvotes: 4
Reputation: 19960
Changing your lapply
statement to the following should fix it:
lapply(cols, FUN= function(x) showTopTenFactors(df, x))
[[1]]
Source: local data frame [3 x 2]
fruit cnt
1 BANANA 36
2 PEAR 36
3 APPLE 28
[[2]]
Source: local data frame [3 x 2]
make cnt
1 HONDA 39
2 TESLA 33
3 FERRARI 28
Specifically specifying arguments in custom functions is generally a good approach within apply
statements.
Upvotes: 1