Reputation: 915
I'm trying filter an order table down to each persons first year worth of orders.
My data is in the following format, in which each row represents an order, but I have added customer-level columns to represent their first order date (Recruitment Date
) as well as the date that marks a year since each customers first order (1st Year Since Recruitment
) and second year since recruitment; the last column is the amount paid on the current order.
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 1876202 obs. of 6 variables:
$ Brand_Acc : chr "B000000001" "B000000002" "B000000002" "B000000002" ...
$ salesdate : Date, format: "2008-03-10" "2008-02-19" "2008-07-14" "2010-08-25" ...
$ Recruitment Date : Date, format: "2008-03-10" "2008-02-19" NA NA ...
$ 1st Year Since Recruitment: Date, format: "2009-03-10" "2009-02-19" NA NA ...
$ 2nd Year Since Recruitment: Date, format: "2010-03-10" "2010-02-19" NA NA ...
$ TotalDiscount : num 97.9 349.9 184.9 284.9 348.9 ...
I want to return a dataframe that captures each customers first year worth of orders.
I have tried the following:
df %>%
group_by(Brand_Acc) %>%
filter(salesdate, between(`Recruitment Date`, `1st Year Since Recruitment`))
But I get this error:
Error in filter_impl(.data, quo) : Evaluation error: argument "right" is missing, with no default.
What is the correct way of doing this?
Edit showing dput of first 5 rows:
dput(df)
structure(list(Brand_Acc = c("B000000001", "B000000002", "B000000002",
"B000000002", "B000000006"), salesdate = structure(c(13948, 13928,
14074, 14846, 13934), class = "Date"), ordertype = c("Recruitment",
"Recruitment", "Conversion", "Active Order", "Recruitment"),
actv_channel = c("MainMail", "MainMail", "Outbound-Other",
"MainMail", "MainMail"), TotalDiscount = c(97.87, 349.88,
184.94, 284.94, 348.9), campaignparentid = c("9017", "9017",
"9035", "9557", "9017"), BrandAccount_Brand = c("wp", "wp",
"wp", "wp", "wp"), recrtype = c("STNRD", "STNRD", "STNRD",
"STNRD", "STNRD"), POA_CODE = structure(c(1937L, 2302L, 2302L,
2302L, 466L), .Label = c("0", "200", "800", "801", "804"), class = "factor"),
`Recruitment Date` = structure(c(13948, 13928, NA, NA, 13934
), class = "Date"), `1st Year Since Recruitment` = structure(c(14313,
14294, NA, NA, 14300), class = "Date"), `2nd Year Since Recruitment` = structure(c(14678,
14659, NA, NA, 14665), class = "Date"), `3rd Year Since Recruitment` = structure(c(15043,
15024, NA, NA, 15030), class = "Date")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -5L))
> ```
Upvotes: 0
Views: 377
Reputation: 15072
Surprisingly between
is not vectorised in its left
and right
arguments, which you might have expected since it describes itself as a simple replacement for the combination of <=
and >=
. We simply have to resort to doing it the long way:
library(tidyverse)
df <- structure(list(Brand_Acc = c("B000000001", "B000000002", "B000000002", "B000000002", "B000000006"), salesdate = structure(c(13948, 13928, 14074, 14846, 13934), class = "Date"), ordertype = c("Recruitment", "Recruitment", "Conversion", "Active Order", "Recruitment"), actv_channel = c("MainMail", "MainMail", "Outbound-Other", "MainMail", "MainMail"), TotalDiscount = c(97.87, 349.88, 184.94, 284.94, 348.9), campaignparentid = c("9017", "9017", "9035", "9557", "9017"), BrandAccount_Brand = c("wp", "wp", "wp", "wp", "wp"), recrtype = c("STNRD", "STNRD", "STNRD", "STNRD", "STNRD"), POA_CODE = structure(c(1937L, 2302L, 2302L, 2302L, 466L), .Label = c("0", "200", "800", "801", "804"), class = "factor"), `Recruitment Date` = structure(c(13948, 13928, NA, NA, 13934), class = "Date"), `1st Year Since Recruitment` = structure(c(14313, 14294, NA, NA, 14300), class = "Date"), `2nd Year Since Recruitment` = structure(c(14678, 14659, NA, NA, 14665), class = "Date"), `3rd Year Since Recruitment` = structure(c(15043, 15024, NA, NA, 15030), class = "Date")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -5L))
df %>%
group_by(Brand_Acc) %>%
filter(salesdate >= `Recruitment Date` & salesdate <= `1st Year Since Recruitment`)
#> # A tibble: 3 x 13
#> # Groups: Brand_Acc [3]
#> Brand_Acc salesdate ordertype actv_channel TotalDiscount
#> <chr> <date> <chr> <chr> <dbl>
#> 1 B0000000… 2008-03-10 Recruitm… MainMail 97.9
#> 2 B0000000… 2008-02-19 Recruitm… MainMail 350.
#> 3 B0000000… 2008-02-25 Recruitm… MainMail 349.
#> # … with 8 more variables: campaignparentid <chr>,
#> # BrandAccount_Brand <chr>, recrtype <chr>, POA_CODE <fct>, `Recruitment
#> # Date` <date>, `1st Year Since Recruitment` <date>, `2nd Year Since
#> # Recruitment` <date>, `3rd Year Since Recruitment` <date>
Created on 2019-02-13 by the reprex package (v0.2.1)
There was also a syntax error in between
though that is not relevant now:
filter(salesdate %>% between(`Recruitment Date`, `1st Year Since Recruitment`))
Upvotes: 2