Sklenicka
Sklenicka

Reputation: 655

generate rows for each id by date sequence

my dataframe looks someway like this

df <- read.table(text="
                 id          start           end
    1            2      2018-10-01    2018-12-01
    2            3      2018-01-01    2018-04-01
", header=TRUE)

What I trying to achieve is get difference between start and day in months for each id and then generate new dataframe with each month for particular id. Result should be

result <- read.table(text="
                 id          date           
    1            2      2018-10-01
    2            2      2018-11-01
    3            2      2018-12-01     
    4            3      2018-01-01
    5            3      2018-02-01    
    6            3      2018-03-01    
    7            3      2018-04-01        
", header=TRUE)

Upvotes: 3

Views: 1591

Answers (2)

akrun
akrun

Reputation: 886948

We can do this easily with Map. Pass the Date converted 'start' and 'end' columnd of the dataset as arguments to Map, get the sequence of 'month's as a list and expand the 'id' based on the lengths of list as well as concatenate the list elements to create the expanded dataframe

lst1 <- Map(seq, MoreArgs = list(by = 'month'), as.Date(df$start), as.Date(df$end))
data.frame(id = rep(df$id, lengths(lst1)), date = do.call(c, lst1))
#  id       date
#1  2 2018-10-01
#2  2 2018-11-01
#3  2 2018-12-01
#4  3 2018-01-01
#5  3 2018-02-01
#6  3 2018-03-01
#7  3 2018-04-01

Or using tidyverse, we mutate the class of the 'start', 'end' columns to Date, using map2 (from purrr), get the sequence of dates from 'start' to 'end' by the 'month' and expand the data by unnesting the dataset

library(tidyverse)
df %>% 
  mutate_at(2:3, as.Date) %>% 
  transmute(id = id, date = map2(start, end,  ~ seq(.x, .y, by = 'month'))) %>% 
  unnest
#  id       date
#1  2 2018-10-01
#2  2 2018-11-01
#3  2 2018-12-01
#4  3 2018-01-01
#5  3 2018-02-01
#6  3 2018-03-01
#7  3 2018-04-01

Upvotes: 3

Ronak Shah
Ronak Shah

Reputation: 388817

Most straightforward way using base R functions is to create a sequence of monthly dates for each row, create a dataframe and rbind them together

do.call(rbind, with(df,lapply(1:nrow(df), function(i) 
   data.frame(id = id[i], date = seq(as.Date(start[i]), as.Date(end[i]), by = "month")))))

#  id       date
#1  2 2018-10-01
#2  2 2018-11-01
#3  2 2018-12-01
#4  3 2018-01-01
#5  3 2018-02-01
#6  3 2018-03-01
#7  3 2018-04-01

Upvotes: 3

Related Questions