Reputation: 611
I have a data frame that consists of customers scheduled subscription payments as follows:
CusID <- c(1,2,3)
FromDate <- c(ymd("2019-01-01"), ymd("2019-01-04"), ymd("2019-02-02"))
ToDate <-c(ymd("2019-01-16"), ymd("2019-01-15"), ymd("2019-04-03"))
Amount <- c(5,10,12)
Frequency <- c("Weekly", "Fortnightly", "Monthly")
Input <- data.frame(CusID, Amount, Frequency, FromDate, ToDate)
For each row (customer), I wish to loop from the FromDate to the ToDate and output one row of each data for each scheduled payment that falls between those dates, resulting in the following data frame:
CusID <- c(1,1,1,2,3,3,3)
PaymentDate <- c(ymd("2019-01-01"), ymd("2019-01-08"), ymd("2019-01-15"),
ymd("2019-01-04"),ymd("2019-02-02"),ymd("2019-03-02"),ymd("2019-04-02"))
Amount <- c(5,5,5,10,12,12,12)
Output <- data.frame(CusID, PaymentDate, Amount)
What is an efficient way to achieve this using R (and preferably using dplyr / tidyverse functions)?
In SAS my approach would be to use a DO / WHILE LOOP and OUTPUT statement to write a new line for each scheduled payment. e.g.
data Output;
set Input;
PaymentDate = FromDate;
do while (PaymentDate < ToDate);
Payment = Amount;
PaymentDate = PaymentDate + (7 / 14 / 30 ~ logic based on Frequency);
output;
loop;
run;
(The key here in SAS is the output statement - it explicitly writes a new record each time it is invoked, thus can be used in a loop to write multiple output lines per input line).
Is there an equivalent method available in R, or is a different approach recommended?
Upvotes: 0
Views: 615
Reputation: 6441
Another option using tidyverse
Input %>%
mutate(Frequency = case_when(Frequency == "Weekly" ~ 7L,
Frequency == "Fortnightly" ~ 14L,
Frequency == "Monthly" ~ 30L,
TRUE ~ 0L)) %>%
group_by(CusID) %>%
group_modify(~ {PaymentDate <- seq.Date(from = .x$FromDate, to = .x$ToDate, by = .x$Frequency)
crossing(.x[,1], PaymentDate)})
# A tibble: 7 x 3
# Groups: CusID [3]
CusID PaymentDate Amount
<dbl> <date> <dbl>
1 1 2019-01-01 5
2 1 2019-01-08 5
3 1 2019-01-15 5
4 2 2019-01-04 10
5 3 2019-02-02 12
6 3 2019-03-04 12
7 3 2019-04-03 12
Payment dates are a little different from your expected output because seq.Date
adds 30 days taking into account the different number of days in those months.
UPDATE:
Here is a more verbatim solution
Input %>%
mutate(PaymentDate = FromDate,
RFrequency = case_when(Frequency == "Weekly" ~ '1 week',
Frequency == "Fortnightly" ~ '2 weeks',
Frequency == "Monthly" ~ '1 month')) %>%
group_by(CusID, Amount) %>%
expand(PaymentDate = seq.Date(FromDate,ToDate, by = RFrequency))
# A tibble: 7 x 3
# Groups: CusID, Amount [3]
CusID Amount PaymentDate
<dbl> <dbl> <date>
1 1 5 2019-01-01
2 1 5 2019-01-08
3 1 5 2019-01-15
4 2 10 2019-01-04
5 3 12 2019-02-02
6 3 12 2019-03-02
7 3 12 2019-04-02
Upvotes: 3
Reputation: 611
A mashup of Humpelstielzchen and user2474226's answers, to bring all logic into a single dplyr step.
Output <- Input %>%
mutate(PaymentDate = FromDate,
RFrequency = case_when(Frequency == "Weekly" ~ '1 week',
Frequency == "Fortnightly" ~ '2 weeks',
Frequency == "Monthly" ~ '1 month')) %>%
group_by(CusID) %>%
complete(PaymentDate = seq.Date(FromDate,ToDate, by = RFrequency)) %>%
fill(PaymentDate,Amount) %>%
select(CusID, PaymentDate, Amount)
Upvotes: 0
Reputation: 1502
I tweaked your Input data.frame so that the Frequency values are strings, not factors.
You could create a helper table freq_mapping to convert from your Frequency to the frequency format R likes. This would avoid the 30 day issue that one of the other answers pointed out.
freq_mapping <- data.frame(Frequency=c('Weekly', 'Fortnightly', 'Monthly'),
RFrequency = c('1 week', '2 weeks', '1 month'),
stringsAsFactors = FALSE)
Then merge Input with this:
Input <- Input %>%
inner_join(freq_mapping, by = 'Frequency')
Now you can create the payment dates:
Input$PaymentDate <- Input$FromDate
Input %>%
group_by(CusID) %>%
complete(PaymentDate = seq.Date(FromDate,ToDate, by = RFrequency)) %>%
fill(PaymentDate,Amount) %>%
select(CusID, PaymentDate, Amount)
Upvotes: 1
Reputation: 159
not so easy problem for me. The solution is not beautiful but it should somehow do the work. You'll see there is a problem for the monthly payment which is not always 30, but otherwise it should work. But nicer solution surely exist.
library(data.table)
Input <- data.frame(CusID, Amount, Frequency, FromDate, ToDate)
Input=data.table(Input)
Input[Frequency=="Weekly",freq:=7][Frequency=="Fortnightly",freq:=14][Frequency=="Monthly",freq:=30]
Input[,Ratio:=(ToDate-FromDate)/freq]
#What is the maximum rows ? for a customer ?
NREP=as.integer(max(ceiling(Input$Ratio)))
Input[,Rep:=1][,PaymentDate:=FromDate]
for(i in 1:NREP){
Inputtemp=copy(Input)
Inputtemp[,FromDate:=FromDate+freq]
Input=rbind(Input,Inputtemp)
}
#Remove invalid rows
Input=unique(Input)
Input=Input[!(FromDate>ToDate),]
setorder(Input,CusID)
Input=Input[,c("CusID","FromDate","Amount")]
setnames(Input,"FromDate","PaymentDate")
Input==data.table(Output)
Upvotes: 0