Brisbane Pom
Brisbane Pom

Reputation: 611

Generate records in an R data frame between two dates

I have a data frame that consists of customers scheduled subscription payments as follows:

CusID <- c(1,2,3)
FromDate <- c(ymd("2019-01-01"), ymd("2019-01-04"), ymd("2019-02-02"))
ToDate <-c(ymd("2019-01-16"), ymd("2019-01-15"), ymd("2019-04-03"))
Amount <- c(5,10,12)
Frequency <- c("Weekly", "Fortnightly", "Monthly")
Input <-  data.frame(CusID, Amount, Frequency, FromDate, ToDate)

For each row (customer), I wish to loop from the FromDate to the ToDate and output one row of each data for each scheduled payment that falls between those dates, resulting in the following data frame:

CusID <- c(1,1,1,2,3,3,3)
PaymentDate <- c(ymd("2019-01-01"), ymd("2019-01-08"), ymd("2019-01-15"),
                 ymd("2019-01-04"),ymd("2019-02-02"),ymd("2019-03-02"),ymd("2019-04-02"))
Amount <- c(5,5,5,10,12,12,12)
Output <-  data.frame(CusID, PaymentDate, Amount)

What is an efficient way to achieve this using R (and preferably using dplyr / tidyverse functions)?

In SAS my approach would be to use a DO / WHILE LOOP and OUTPUT statement to write a new line for each scheduled payment. e.g.

data Output;
set Input;
PaymentDate = FromDate;
do while (PaymentDate < ToDate);
Payment = Amount;
PaymentDate = PaymentDate + (7 / 14 / 30 ~ logic based on Frequency);
output;
loop;
run;

(The key here in SAS is the output statement - it explicitly writes a new record each time it is invoked, thus can be used in a loop to write multiple output lines per input line).

Is there an equivalent method available in R, or is a different approach recommended?

Upvotes: 0

Views: 615

Answers (4)

Humpelstielzchen
Humpelstielzchen

Reputation: 6441

Another option using tidyverse

Input %>%
  mutate(Frequency = case_when(Frequency == "Weekly" ~ 7L,
                               Frequency == "Fortnightly" ~ 14L,
                               Frequency == "Monthly" ~ 30L,
                               TRUE ~ 0L)) %>%
  group_by(CusID) %>%
  group_modify(~ {PaymentDate <- seq.Date(from = .x$FromDate, to = .x$ToDate, by = .x$Frequency) 
                 crossing(.x[,1], PaymentDate)})  


# A tibble: 7 x 3
# Groups:   CusID [3]
  CusID PaymentDate Amount
  <dbl> <date>       <dbl>
1     1 2019-01-01       5
2     1 2019-01-08       5
3     1 2019-01-15       5
4     2 2019-01-04      10
5     3 2019-02-02      12
6     3 2019-03-04      12
7     3 2019-04-03      12

Payment dates are a little different from your expected output because seq.Date adds 30 days taking into account the different number of days in those months.

UPDATE:

Here is a more verbatim solution

Input %>% 
  mutate(PaymentDate = FromDate,
         RFrequency = case_when(Frequency == "Weekly" ~ '1 week',
                                Frequency == "Fortnightly" ~ '2 weeks',
                                Frequency == "Monthly" ~ '1 month')) %>% 
  group_by(CusID, Amount) %>% 
  expand(PaymentDate = seq.Date(FromDate,ToDate, by = RFrequency)) 

# A tibble: 7 x 3
# Groups:   CusID, Amount [3]
  CusID Amount PaymentDate
  <dbl>  <dbl> <date>     
1     1      5 2019-01-01 
2     1      5 2019-01-08 
3     1      5 2019-01-15 
4     2     10 2019-01-04 
5     3     12 2019-02-02 
6     3     12 2019-03-02 
7     3     12 2019-04-02 

Upvotes: 3

Brisbane Pom
Brisbane Pom

Reputation: 611

A mashup of Humpelstielzchen and user2474226's answers, to bring all logic into a single dplyr step.

Output <- Input %>% 
  mutate(PaymentDate = FromDate,
         RFrequency = case_when(Frequency == "Weekly" ~ '1 week',
                                Frequency == "Fortnightly" ~ '2 weeks',
                                Frequency == "Monthly" ~ '1 month')) %>% 
  group_by(CusID) %>% 
  complete(PaymentDate = seq.Date(FromDate,ToDate, by = RFrequency)) %>% 
  fill(PaymentDate,Amount) %>% 
  select(CusID, PaymentDate, Amount)

Upvotes: 0

user2474226
user2474226

Reputation: 1502

I tweaked your Input data.frame so that the Frequency values are strings, not factors.

You could create a helper table freq_mapping to convert from your Frequency to the frequency format R likes. This would avoid the 30 day issue that one of the other answers pointed out.

freq_mapping <- data.frame(Frequency=c('Weekly', 'Fortnightly', 'Monthly'), 
                           RFrequency = c('1 week', '2 weeks', '1 month'),
                           stringsAsFactors =  FALSE)

Then merge Input with this:

Input <- Input %>%
    inner_join(freq_mapping, by = 'Frequency')

Now you can create the payment dates:

Input$PaymentDate <- Input$FromDate
Input %>% 
    group_by(CusID) %>% 
    complete(PaymentDate = seq.Date(FromDate,ToDate, by = RFrequency)) %>% 
    fill(PaymentDate,Amount) %>% 
    select(CusID, PaymentDate, Amount)

Upvotes: 1

F&#233;lix Cuneo
F&#233;lix Cuneo

Reputation: 159

not so easy problem for me. The solution is not beautiful but it should somehow do the work. You'll see there is a problem for the monthly payment which is not always 30, but otherwise it should work. But nicer solution surely exist.

    library(data.table)
Input <-  data.frame(CusID, Amount, Frequency, FromDate, ToDate)

Input=data.table(Input)
Input[Frequency=="Weekly",freq:=7][Frequency=="Fortnightly",freq:=14][Frequency=="Monthly",freq:=30]
Input[,Ratio:=(ToDate-FromDate)/freq]

#What is the maximum rows ? for a customer ?
NREP=as.integer(max(ceiling(Input$Ratio)))

Input[,Rep:=1][,PaymentDate:=FromDate]
for(i in 1:NREP){
Inputtemp=copy(Input)
Inputtemp[,FromDate:=FromDate+freq]
Input=rbind(Input,Inputtemp)  
}

#Remove invalid rows
Input=unique(Input)

Input=Input[!(FromDate>ToDate),]
setorder(Input,CusID)
Input=Input[,c("CusID","FromDate","Amount")]
setnames(Input,"FromDate","PaymentDate")
Input==data.table(Output)

Upvotes: 0

Related Questions