Roberto
Roberto

Reputation: 2990

Programming with dplyr 0.3

I am trying to group and summarise with a function, using the new underscore functions for standard evaluation provided in dplyr 0.3. However, I ran into an issue when trying to use lapply instead of a loop:

Small example

fruits <- c("APPLE", "PEAR", "BANANA")
makes <- c("HONDA", "FERRARI", "TESLA")
df <- data.frame(fruit = sample(fruits, 100, replace = T), 
                 make  = sample(makes, 100, replace = T), 
                 value = 1:100)
cols <- c("fruit", "make")

showTopTenFactors <- function(x, ...) x %>% 
                                      group_by_(...) %>% 
                                      summarise(cnt = n()) %>% 
                                      arrange(desc(cnt)) %>% 
                                      head(10)

Now this loop gives me the desired output

for(i in cols){
  showTopTenFactors(df, i) %>% print
}

Source: local data frame [3 x 2]

   fruit cnt
1  APPLE  49
2 BANANA  30
3   PEAR  21
Source: local data frame [3 x 2]

     make cnt
1   HONDA  35
2   TESLA  34
3 FERRARI  31

But when I try to substitute it with

lapply(cols, showTopTenFactors, df)

I get the following error message:

 Error in UseMethod("group_by_") : 
  no applicable method for 'group_by_' applied to an object of class "character"

Upvotes: 0

Views: 3344

Answers (2)

IRTFM
IRTFM

Reputation: 263362

I don't think you actually need to create an anonymous function. lapply should be able to pass along an argument as long as it is named correctly:

> lapply(cols, showTopTenFactors, x=df)
[[1]]
Source: local data frame [3 x 2]

   fruit cnt
1 BANANA  41
2  APPLE  32
3   PEAR  27

[[2]]
Source: local data frame [3 x 2]

     make cnt
1 FERRARI  45
2   TESLA  30
3   HONDA  25

You were letting the 'cols' values get matched to the x in your function. This is not specific to dplyr-based functions but is rather a generic R issue.

Upvotes: 4

cdeterman
cdeterman

Reputation: 19960

Changing your lapply statement to the following should fix it:

lapply(cols, FUN= function(x) showTopTenFactors(df, x))
[[1]]
Source: local data frame [3 x 2]

   fruit cnt
1 BANANA  36
2   PEAR  36
3  APPLE  28

[[2]]
Source: local data frame [3 x 2]

     make cnt
1   HONDA  39
2   TESLA  33
3 FERRARI  28

Specifically specifying arguments in custom functions is generally a good approach within apply statements.

Upvotes: 1

Related Questions