chopin_is_the_best
chopin_is_the_best

Reputation: 2101

Expand dataframe following sets of rules

I have a quite complex problem I am not able to tackle.

I have a dataframe I read in dplyr:

trans_id date       type
9373    2019-09-29  6-months 
9945    2019-08-15  3-months 
9945    2019-11-13  3-months 
9615    2019-12-28  3-months 
11465   2019-07-13  3-months 
11465   2019-10-11  3-months 

reproducible example:

library(tidyverse)

df <- data.frame(stringsAsFactors=FALSE,
          id = c(9373, 9945, 9945, 9615, 11465, 11465),
        date = c("2019-09-29", "2019-08-15", "2019-11-13", "2019-12-28",
                 "2019-07-13", "2019-10-11"),
        type = c("6-months", "3-months", "3-months", "3-months", "3-months",
                 "3-months")) %>%
  mutate(date = as.Date(date))

Each id is a transaction, happened on a given date; each transaction can be either repeated every 3 months or 6 months - as specified in type.

I want to expand these transactions in their monthly counterparts up to the current date; this means that the first transaction 9373 has to be repeated 6 times with a 30 days cycle (type == 6-months) starting from 2019-09-29 up to current day (today is 2020-01-07), aka is going to be just 4 single monthly transactions since the last two have to happen yet.

Same for the 3-months transactions, always considering the starting date and the current date.

Example of the final result:

id      date        type
9373    2019-09-29  6-months # first 6-months cycle transaction
9373    2019-10-29  6-months 
9373    2019-11-28  6-months 
9373    2019-12-28  6-months 
9945    2019-08-15  3-months # 
9945    2019-09-14  3-months 
9945    2019-10-14  3-months 
9945    2019-11-13  3-months #
9945    2019-12-13  3-months 
9615    2019-12-28  3-months #

Any help is highly appreciated!

Upvotes: 2

Views: 42

Answers (2)

Rohit
Rohit

Reputation: 2017

You can use rowwise and do like so:

df %>% 
  rowwise() %>% 
  do({
    p <- as.numeric(gsub('\\D+','',.$type))-1
    tibble(
      id=.$id,
      date=seq(.$date,pmin(Sys.Date(),.$date+p*30),30),
      type=.$type
    )
  }) %>% 
  ungroup()

# A tibble: 16 x 3
# id date       type    
# * <dbl> <date>     <chr>   
#   1  9373 2019-09-29 6-months
# 2  9373 2019-10-29 6-months
# 3  9373 2019-11-28 6-months
# 4  9373 2019-12-28 6-months
# 5  9945 2019-08-15 3-months
# 6  9945 2019-09-14 3-months
# 7  9945 2019-10-14 3-months
# 8  9945 2019-11-13 3-months
# 9  9945 2019-12-13 3-months
# 10  9615 2019-12-28 3-months
# 11 11465 2019-07-13 3-months
# 12 11465 2019-08-12 3-months
# 13 11465 2019-09-11 3-months
# 14 11465 2019-10-11 3-months
# 15 11465 2019-11-10 3-months
# 16 11465 2019-12-10 3-months

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388907

Here is one way using dplyr and tidyr functions.

library(dplyr)
library(tidyr)

df %>%
  #Extract the number from type column
  mutate(num = readr::parse_number(type)) %>%
  #For each transcation
  group_by(row = row_number()) %>%
  #Create a sequence from date till number of months with a break of 30 days
  complete(id, type, date = seq(date, by = "30 days", length.out = num)) %>%
  #Remove rows which have date value greater than today
  filter(date <= Sys.Date()) %>%
  ungroup() %>%
  select(-num, -row)

# A tibble: 16 x 3
#      id type     date      
#   <dbl> <chr>    <date>    
# 1  9373 6-months 2019-09-29
# 2  9373 6-months 2019-10-29
# 3  9373 6-months 2019-11-28
# 4  9373 6-months 2019-12-28
# 5  9945 3-months 2019-08-15
# 6  9945 3-months 2019-09-14
# 7  9945 3-months 2019-10-14
# 8  9945 3-months 2019-11-13
# 9  9945 3-months 2019-12-13
#10  9615 3-months 2019-12-28
#11 11465 3-months 2019-07-13
#12 11465 3-months 2019-08-12
#13 11465 3-months 2019-09-11
#14 11465 3-months 2019-10-11
#15 11465 3-months 2019-11-10
#16 11465 3-months 2019-12-10

Upvotes: 1

Related Questions