Reputation: 1841
Given a column of dates, this will count the number of records in each month
library(dplyr)
library(lubridate)
samp <- tbl_df(seq.Date(as.Date("2017-01-01"), as.Date("2017-12-01"), by="day"))
freq <- samp %>%
filter(!is.na(value)) %>%
transmute(month = floor_date(value, "month")) %>%
group_by(month) %>% summarise(adds = n())
freq
# A tibble: 12 x 2
month adds
<date> <int>
1 2017-01-01 31
2 2017-02-01 28
3 2017-03-01 31
4 2017-04-01 30
5 2017-05-01 31
6 2017-06-01 30
7 2017-07-01 31
8 2017-08-01 31
9 2017-09-01 30
10 2017-10-01 31
11 2017-11-01 30
12 2017-12-01 1
>
I would like to convert this to a function, so that I can perform the operation on a number of variables. Have read the vignette on dplyr programming, but continue to have issues.
My attempt;
library(rlang)
count_x_month <- function(df, var, name){
var <- enquo(var)
name <- enquo(name)
df %>%
filter(!is.na(!!var)) %>%
transmute(month := floor_date(!!var, "month")) %>%
group_by(month) %>% summarise(!!name := n())
}
freq2 <- samp %>% count_x_month(value, out)
Error message;
Error: invalid argument type
Making this version of the function work will be a big help. More broadly, other ways to achieve the objective would be welcome. One way to state the problem; given a dataframe of customers and first purchase dates, count the number of customers purchasing for the first time in each month.
update: The selected answer works in dplyr 0.7.4, but the rstudio environment I have access to has dplyr 0.5.0. What modifications are required to 'backport' this function?
Upvotes: 2
Views: 553
Reputation: 3055
You forgot to quo_name
it
library(rlang)
count_x_month <- function(df, var, name){
var <- enquo(var)
name <- enquo(name)
name <- quo_name(name)
df %>%
filter(!is.na(!!var)) %>%
transmute(month := floor_date(!!var, "month")) %>%
group_by(month) %>%
summarise(!!name := n())
}
freq2 <- samp %>% count_x_month(value, out)
# A tibble: 12 x 2
month out
<date> <int>
1 2017-01-01 31
2 2017-02-01 28
3 2017-03-01 31
4 2017-04-01 30
5 2017-05-01 31
6 2017-06-01 30
7 2017-07-01 31
8 2017-08-01 31
9 2017-09-01 30
10 2017-10-01 31
11 2017-11-01 30
12 2017-12-01 1
See "Different input and output variable" section of "Programming with dplyr":
We create the new names by pasting together strings, so we need quo_name() to convert the input expression to a string.
Upvotes: 2
Reputation: 13354
Create a dataframe showing customer IDs and first purchase dates:
dates <- seq.Date(as.Date("2017-01-01"), as.Date("2017-12-01"), by="day")
dates_rep <- c(dates,dates,dates)
cust_ids <- paste('id_', floor(runif(length(dates_rep), min=0, max=100000)))
cust_frame <- data.frame(ID=cust_ids, FP_DATE=dates_rep)
head(cust_frame)
Use the plyr package to aggregate by FP_DATE:
library(plyr)
count(cust_frame, c('FP_DATE'))
Therefore, given a dataframe of customers and first purchase dates, we get a count of the number of customers purchasing for the first time in each month.
You can extend this to aggregate across any number of features in your dataset:
count(cust_frame, c('FP_DATE', 'feature_b', 'feature_c', 'feature_d', 'feature_e'))
Upvotes: 0
Reputation: 3255
The error is caused by summarise(df, !!name := n())
and is solved by replacing the second line of the function with
name <- substitute(name)
The reason, as far as I understand it is, that a quosure is not only its name, but it carries with it the environment from where it came. This makes sense when specifying column names in functions. The function must know from which data frame (=environment in this case) the column comes to replace the name with the values.
However, name
shall take a new name, specified by the user. There is nothing to replace it with. I suspect if using name <- enquo(name)
, R wants to replace !!name
by values instead of just putting in the new name. Therefore it complains that on the LHS there is no name (because R replaced it by values(?))
Not sure though if substitute
is the ideomatic "programming with dplyr" way though. Comments are welcome.
Upvotes: 0