justin cress
justin cress

Reputation: 1841

Convert dplyr chain into a function

Given a column of dates, this will count the number of records in each month

library(dplyr)
library(lubridate)

samp <- tbl_df(seq.Date(as.Date("2017-01-01"), as.Date("2017-12-01"), by="day"))

freq <- samp %>%
    filter(!is.na(value)) %>% 
    transmute(month = floor_date(value, "month")) %>%
    group_by(month) %>% summarise(adds = n())


freq
# A tibble: 12 x 2
        month  adds
       <date> <int>
 1 2017-01-01    31
 2 2017-02-01    28
 3 2017-03-01    31
 4 2017-04-01    30
 5 2017-05-01    31
 6 2017-06-01    30
 7 2017-07-01    31
 8 2017-08-01    31
 9 2017-09-01    30
10 2017-10-01    31
11 2017-11-01    30
12 2017-12-01     1
> 

I would like to convert this to a function, so that I can perform the operation on a number of variables. Have read the vignette on dplyr programming, but continue to have issues.

My attempt;

library(rlang)
count_x_month <- function(df, var, name){
    var <- enquo(var)
    name <- enquo(name)

    df %>%
    filter(!is.na(!!var)) %>% 
    transmute(month := floor_date(!!var, "month")) %>%
    group_by(month) %>% summarise(!!name := n())
} 

freq2 <- samp %>% count_x_month(value, out)

Error message;

 Error: invalid argument type 

Making this version of the function work will be a big help. More broadly, other ways to achieve the objective would be welcome. One way to state the problem; given a dataframe of customers and first purchase dates, count the number of customers purchasing for the first time in each month.

update: The selected answer works in dplyr 0.7.4, but the rstudio environment I have access to has dplyr 0.5.0. What modifications are required to 'backport' this function?

Upvotes: 2

Views: 553

Answers (3)

dmi3kno
dmi3kno

Reputation: 3055

You forgot to quo_name it

library(rlang)
count_x_month <- function(df, var, name){
  var <- enquo(var)
  name <- enquo(name)
  name <- quo_name(name)

  df %>%
    filter(!is.na(!!var)) %>% 
    transmute(month := floor_date(!!var, "month")) %>%
    group_by(month) %>% 
    summarise(!!name := n())
} 

freq2 <- samp %>% count_x_month(value, out)

# A tibble: 12 x 2
        month   out
       <date> <int>
 1 2017-01-01    31
 2 2017-02-01    28
 3 2017-03-01    31
 4 2017-04-01    30
 5 2017-05-01    31
 6 2017-06-01    30
 7 2017-07-01    31
 8 2017-08-01    31
 9 2017-09-01    30
10 2017-10-01    31
11 2017-11-01    30
12 2017-12-01     1

See "Different input and output variable" section of "Programming with dplyr":

We create the new names by pasting together strings, so we need quo_name() to convert the input expression to a string.

Upvotes: 2

Cybernetic
Cybernetic

Reputation: 13354

Create a dataframe showing customer IDs and first purchase dates:

dates <- seq.Date(as.Date("2017-01-01"), as.Date("2017-12-01"), by="day")
dates_rep <- c(dates,dates,dates)
cust_ids <- paste('id_', floor(runif(length(dates_rep), min=0, max=100000)))
cust_frame <- data.frame(ID=cust_ids, FP_DATE=dates_rep)

head(cust_frame)

enter image description here

Use the plyr package to aggregate by FP_DATE:

library(plyr)
count(cust_frame, c('FP_DATE'))

Therefore, given a dataframe of customers and first purchase dates, we get a count of the number of customers purchasing for the first time in each month.

enter image description here

You can extend this to aggregate across any number of features in your dataset:

count(cust_frame, c('FP_DATE', 'feature_b', 'feature_c', 'feature_d', 'feature_e'))

Upvotes: 0

akraf
akraf

Reputation: 3255

The error is caused by summarise(df, !!name := n()) and is solved by replacing the second line of the function with

name <- substitute(name)

The reason, as far as I understand it is, that a quosure is not only its name, but it carries with it the environment from where it came. This makes sense when specifying column names in functions. The function must know from which data frame (=environment in this case) the column comes to replace the name with the values.

However, name shall take a new name, specified by the user. There is nothing to replace it with. I suspect if using name <- enquo(name), R wants to replace !!name by values instead of just putting in the new name. Therefore it complains that on the LHS there is no name (because R replaced it by values(?))

Not sure though if substitute is the ideomatic "programming with dplyr" way though. Comments are welcome.

Upvotes: 0

Related Questions