Ben Nutzer
Ben Nutzer

Reputation: 1163

Unstack lubridate's interval class

I am trying to transform a dataframe df consisting of a value-column, two date columns (start and end), and a interval-column (duration) into the long format by unnesting/unstacking the duration column.

library(dplyr)
library(lubridate)

df <- data.frame(value = letters[1:3], start = as_date(1:3), end = as_date(3:1)+3) %>% 
          mutate(duration = interval(start, end))

The expected outcome would a dataframe where value, start, and end are duplicated for each day as defined by duration. For instance, the value 'a' would appear 6 times each time on a different day (2nd, 3rd, 4th, 5th, 6th, 7th January 1970).

I tried to use the unnest function from the tidyr package but nothing happened.

tidyr::unnest(df, duration) 

Any help is greatly appreciated :)

Upvotes: 4

Views: 136

Answers (6)

AnilGoyal
AnilGoyal

Reputation: 26218

with uncount

df %>% uncount(as.integer(duration/(24*60*60)) +1) %>%
  group_by(value) %>%
  mutate(date = row_number() -1 + start)

# A tibble: 12 x 5
# Groups:   value [3]
   value start      end        duration                       date      
   <chr> <date>     <date>     <Interval>                     <date>    
 1 a     1970-01-02 1970-01-07 1970-01-02 UTC--1970-01-07 UTC 1970-01-02
 2 a     1970-01-02 1970-01-07 1970-01-02 UTC--1970-01-07 UTC 1970-01-03
 3 a     1970-01-02 1970-01-07 1970-01-02 UTC--1970-01-07 UTC 1970-01-04
 4 a     1970-01-02 1970-01-07 1970-01-02 UTC--1970-01-07 UTC 1970-01-05
 5 a     1970-01-02 1970-01-07 1970-01-02 UTC--1970-01-07 UTC 1970-01-06
 6 a     1970-01-02 1970-01-07 1970-01-02 UTC--1970-01-07 UTC 1970-01-07
 7 b     1970-01-03 1970-01-06 1970-01-03 UTC--1970-01-06 UTC 1970-01-03
 8 b     1970-01-03 1970-01-06 1970-01-03 UTC--1970-01-06 UTC 1970-01-04
 9 b     1970-01-03 1970-01-06 1970-01-03 UTC--1970-01-06 UTC 1970-01-05
10 b     1970-01-03 1970-01-06 1970-01-03 UTC--1970-01-06 UTC 1970-01-06
11 c     1970-01-04 1970-01-05 1970-01-04 UTC--1970-01-05 UTC 1970-01-04
12 c     1970-01-04 1970-01-05 1970-01-04 UTC--1970-01-05 UTC 1970-01-05

Upvotes: 2

Anoushiravan R
Anoushiravan R

Reputation: 21908

You can also use the following solution. Since we are going to create duplicate rows we can wrap the operation in a list and then use unnest_longer. purrr package functions have always been my first choice but you can also use this as an alternative.

library(dplyr)
library(tidyr)
library(lubridate)


df %>% 
  group_by(value) %>%
  mutate(date = list(start + 0:(duration/ddays(1)))) %>%
  unnest_longer(date) %>%
  select(-duration)


# A tibble: 12 x 4
# Groups:   value [3]
   value start      end        date      
   <chr> <date>     <date>     <date>    
 1 a     1970-01-02 1970-01-07 1970-01-02
 2 a     1970-01-02 1970-01-07 1970-01-03
 3 a     1970-01-02 1970-01-07 1970-01-04
 4 a     1970-01-02 1970-01-07 1970-01-05
 5 a     1970-01-02 1970-01-07 1970-01-06
 6 a     1970-01-02 1970-01-07 1970-01-07
 7 b     1970-01-03 1970-01-06 1970-01-03
 8 b     1970-01-03 1970-01-06 1970-01-04
 9 b     1970-01-03 1970-01-06 1970-01-05
10 b     1970-01-03 1970-01-06 1970-01-06
11 c     1970-01-04 1970-01-05 1970-01-04
12 c     1970-01-04 1970-01-05 1970-01-05

Upvotes: 3

Wimpel
Wimpel

Reputation: 27732

A data.table approach

library(data.table)
setDT(df)[, .(date = seq(start, end, by = 1)), by = .(value)]
# value       date
# 1:     a 1970-01-02
# 2:     a 1970-01-03
# 3:     a 1970-01-04
# 4:     a 1970-01-05
# 5:     a 1970-01-06
# 6:     a 1970-01-07
# 7:     b 1970-01-03
# 8:     b 1970-01-04
# 9:     b 1970-01-05
#10:     b 1970-01-06
#11:     c 1970-01-04
#12:     c 1970-01-05

Upvotes: 1

Peter H.
Peter H.

Reputation: 2164

You can't unstack a column of intervals and expect it to generate all dates in between, but by using seq you can generate them yourself. Try this:

library(tidyverse)
library(lubridate)

df %>%
  rowwise() %>% 
  summarise(
    value, dates = seq(start, end, by = 1)
  )

#> # A tibble: 12 x 2
#>    value dates     
#>    <chr> <date>    
#>  1 a     1970-01-02
#>  2 a     1970-01-03
#>  3 a     1970-01-04
#>  4 a     1970-01-05
#>  5 a     1970-01-06
#>  6 a     1970-01-07
#>  7 b     1970-01-03
#>  8 b     1970-01-04
#>  9 b     1970-01-05
#> 10 b     1970-01-06
#> 11 c     1970-01-04
#> 12 c     1970-01-05

Created on 2021-05-18 by the reprex package (v1.0.0)

Upvotes: 3

Ronak Shah
Ronak Shah

Reputation: 388797

To extract the start and end date from interval you can use int_start and int_end, create a sequence of dates with map2 and unnest.

library(dplyr)
library(purrr)
library(tidyr)
library(lubridate)

df %>%
  mutate(date = map2(int_start(duration), int_end(duration), 
                ~seq(as.Date(.x), as.Date(.y), by = 'day'))) %>%
  #This will also work but would return date of class POSIXct
  #mutate(date = map2(int_start(duration), int_end(duration),seq,by = 'day')) %>%
  unnest(date) %>%
  select(-duration)

#    value start      end        date      
#   <chr> <date>     <date>     <date>    
# 1 a     1970-01-02 1970-01-07 1970-01-02
# 2 a     1970-01-02 1970-01-07 1970-01-03
# 3 a     1970-01-02 1970-01-07 1970-01-04
# 4 a     1970-01-02 1970-01-07 1970-01-05
# 5 a     1970-01-02 1970-01-07 1970-01-06
# 6 a     1970-01-02 1970-01-07 1970-01-07
# 7 b     1970-01-03 1970-01-06 1970-01-03
# 8 b     1970-01-03 1970-01-06 1970-01-04
# 9 b     1970-01-03 1970-01-06 1970-01-05
#10 b     1970-01-03 1970-01-06 1970-01-06
#11 c     1970-01-04 1970-01-05 1970-01-04
#12 c     1970-01-04 1970-01-05 1970-01-05

Upvotes: 2

Andrew Gustar
Andrew Gustar

Reputation: 18425

I don't think interval is helping here - seq.Date might be better...

library(purrr) #as well as those you have

df <- data.frame(value = letters[1:3], start = as_date(1:3), end = as_date(3:1)+3) %>% 
   mutate(day = map2(start, end, seq.Date, by = "day")) %>% 
   unnest(day)

df
# A tibble: 12 x 4
   value start      end        day       
   <chr> <date>     <date>     <date>    
 1 a     1970-01-02 1970-01-07 1970-01-02
 2 a     1970-01-02 1970-01-07 1970-01-03
 3 a     1970-01-02 1970-01-07 1970-01-04
 4 a     1970-01-02 1970-01-07 1970-01-05
 5 a     1970-01-02 1970-01-07 1970-01-06
 6 a     1970-01-02 1970-01-07 1970-01-07
 7 b     1970-01-03 1970-01-06 1970-01-03
 8 b     1970-01-03 1970-01-06 1970-01-04
 9 b     1970-01-03 1970-01-06 1970-01-05
10 b     1970-01-03 1970-01-06 1970-01-06
11 c     1970-01-04 1970-01-05 1970-01-04
12 c     1970-01-04 1970-01-05 1970-01-05

Upvotes: 2

Related Questions