Alex
Alex

Reputation: 2780

strptime range and make a date column

I have dates in the following form

Date                      Value
<chr>                      <dbl>
[2014-1-24 - 2014-2-2]      1.1
[2014-2-3 - 2014-3-2]       2.2
.                           .
.                           .
.                           .

This goes on for many years. I would like to convert this to a long format as follows

Date          Value
<date>        <dbl>
2014-01-24     1.1
2014-01-25     1.1
2014-01-26     1.1
2014-01-27     1.1
2014-01-28     1.1
2014-01-29     1.1
2014-01-30     1.1
2014-01-31     1.1
2014-02-01     1.1
2014-02-02     1.1
2014-02-03     2.2
2014-02-04     2.2
2014-02-05     2.2
.               .
.               .
.               .

What is a clean way to accomplish this?

Upvotes: 0

Views: 164

Answers (2)

akrun
akrun

Reputation: 887098

Here is an option using data.table with lubridate. Grouped by 'Value' (assuming it is unique -if not use the sequence of rows), split the 'Date' into two columns with tstrsplit, convert it to Date class with ymd (from lubridate), and get the sequence of dates using Reduce

library(data.table)
library(lubridate)
setDT(df1)[, .(Date = Reduce(function(...) seq(..., by = '1 day'), 
               lapply(tstrsplit(Date, "\\s-\\s"), ymd))), Value][, .(Date, Value)]
#          Date Value
# 1: 2014-01-24   1.1
# 2: 2014-01-25   1.1
# 3: 2014-01-26   1.1
# 4: 2014-01-27   1.1
# 5: 2014-01-28   1.1
# 6: 2014-01-29   1.1
# 7: 2014-01-30   1.1
# 8: 2014-01-31   1.1
# 9: 2014-02-01   1.1
#10: 2014-02-02   1.1
#11: 2014-02-03   2.2
#12: 2014-02-04   2.2
#13: 2014-02-05   2.2
#14: 2014-02-06   2.2
# - -
# - -

Upvotes: 1

akuiper
akuiper

Reputation: 214957

Use dplyr and tidyr:

library(dplyr); library(tidyr);

df %>% 
    mutate(Date = str_match_all(Date, '\\d{4}-\\d{1,2}-\\d{1,2}'), 
           Date = lapply(Date, function(d) seq(as.Date(d[1]), as.Date(d[2]), by='day'))) %>% 
    unnest() 

#   Value       Date
#1    1.1 2014-01-24
#2    1.1 2014-01-25
#3    1.1 2014-01-26
#4    1.1 2014-01-27
#5    1.1 2014-01-28
#6    1.1 2014-01-29
#7    1.1 2014-01-30
#8    1.1 2014-01-31
#9    1.1 2014-02-01
#10   1.1 2014-02-02
#11   2.2 2014-02-03
#12   2.2 2014-02-04
# ...

Use purrr:

library(stringr); library(purrr)

# extract the start and end date from Date string
df$Date <- map(str_match_all(df$Date, '\\d{4}-\\d{1,2}-\\d{1,2}'), as.Date)

# map over rows and expand the date from range to Sequence using seq.Date
pmap_df(df, ~ data_frame(Date = seq(.x[1], .x[2], by='day'), Value = .y))

# A tibble: 38 x 2
#         Date Value
#       <date> <dbl>
# 1 2014-01-24   1.1
# 2 2014-01-25   1.1
# 3 2014-01-26   1.1
# 4 2014-01-27   1.1
# 5 2014-01-28   1.1
# 6 2014-01-29   1.1
# 7 2014-01-30   1.1
# 8 2014-01-31   1.1
# 9 2014-02-01   1.1
#10 2014-02-02   1.1
# ... with 28 more rows

Upvotes: 1

Related Questions