Christopher Penn
Christopher Penn

Reputation: 539

Disaggregate quarterly data to daily data in R keeping values?

How does one easily disaggregate quarterly data to daily data? In this case I'm using 10 years of US GDP data which have quarterly observations, and I want to expand the data frame to daily level, carrying over the GDP value each day until the next observation.

Reprex table:

structure(list(thedate = structure(c(14426, 14518, 14610, 14700, 
14791, 14883, 14975, 15065, 15156, 15248, 15340, 15431, 15522, 
15614, 15706, 15796, 15887, 15979, 16071, 16161, 16252, 16344, 
16436, 16526, 16617, 16709, 16801, 16892, 16983, 17075, 17167, 
17257, 17348, 17440, 17532, 17622, 17713, 17805, 17897, 17987
), class = "Date"), gdp = c(1.5, 4.5, 1.5, 3.7, 3, 2, -1, 2.9, 
-0.1, 4.7, 3.2, 1.7, 0.5, 0.5, 3.6, 0.5, 3.2, 3.2, -1.1, 5.5, 
5, 2.3, 3.2, 3, 1.3, 0.1, 2, 1.9, 2.2, 2, 2.3, 2.2, 3.2, 3.5, 
2.5, 3.5, 2.9, 1.1, 3.1, 2.1)), class = "data.frame", row.names = c(NA, 
-40L))

We see above:

2009-07-01 | 1.5
2009-10-01 | 4.5

The intended output would look like:

2009-07-01 | 1.5
2009-07-02 | 1.5
2009-07-03 | 1.5
etc.
2009-10-01 | 4.5
2009-10-02 | 4.5
2009-10-03 | 4.5

Upvotes: 4

Views: 882

Answers (2)

Cole
Cole

Reputation: 11255

Here's a base solution:

last_quarter_end_date <- seq.Date(df$thedate[nrow(df)], by = 'quarter', length.out = 2)[-1]-1
seqs <- diff(c(df$thedate, last_quarter_end_date))

data.frame(thedate = rep(df$thedate, seqs) + sequence(seqs)-1
           , gdp = rep(df$gdp, seqs))

Basically, the difference between dates is how many times you need to repeat a GDP column. Also, I can do seq_len() for each difference to add back to the original date.

Performance This approach is efficient although I'll note that 0.6 ms isn't really much different than 15 ms in the big picture.

Unit: microseconds
      expr     min       lq      mean  median       uq     max neval
 cole_base   528.1   554.15   690.379   644.9   663.75  3225.7   100
  d_b_base 15735.0 15994.40 17395.754 16243.9 18108.30 38761.8   100
 Ben_tidyr  2808.7  2936.40  3356.324  3076.6  3149.65  8065.1   100

Complete code for reference:

DF <- structure(list(thedate = structure(c(14426, 14518, 14610, 14700, 
                                           14791, 14883, 14975, 15065, 15156, 15248, 15340, 15431, 15522, 
                                           15614, 15706, 15796, 15887, 15979, 16071, 16161, 16252, 16344, 
                                           16436, 16526, 16617, 16709, 16801, 16892, 16983, 17075, 17167, 
                                           17257, 17348, 17440, 17532, 17622, 17713, 17805, 17897, 17987
), class = "Date"), gdp = c(1.5, 4.5, 1.5, 3.7, 3, 2, -1, 2.9, 
                            -0.1, 4.7, 3.2, 1.7, 0.5, 0.5, 3.6, 0.5, 3.2, 3.2, -1.1, 5.5, 
                            5, 2.3, 3.2, 3, 1.3, 0.1, 2, 1.9, 2.2, 2, 2.3, 2.2, 3.2, 3.5, 
                            2.5, 3.5, 2.9, 1.1, 3.1, 2.1)), class = "data.frame", row.names = c(NA, 
                                                                                                -40L))

library(microbenchmark)
library(tidyr)

microbenchmark(cole_base = {
  last_quarter_end_date <- seq.Date(DF$thedate[nrow(DF)], by = 'quarter', length.out = 2)[-1]-1
  seqs <- diff(c(DF$thedate, last_quarter_end_date))

  data.frame(thedate = rep(DF$thedate, seqs) + sequence(seqs)-1
             , gdp = rep(DF$gdp, seqs))
}
, d_b_base = {
  do.call(rbind, lapply(2:NROW(DF), function(i){
    data.frame(date = head(seq.Date(DF$thedate[i-1], DF$thedate[i], "days"), -1),
               gdp = DF$gdp[i - 1])
     }))
}
, Ben_tidyr = {
  DF %>%
    complete(thedate = seq.Date(min(thedate), max(thedate), by="day")) %>%
    fill(gdp)
}
)

Upvotes: 1

Ben
Ben

Reputation: 30474

Here is a tidyr and zoo package answer that uses 'last observation carried forward' after inserting a sequence of dates with NA:

library(tidyverse)
library(zoo)

data %>%
  complete(thedate = seq.Date(min(thedate), max(thedate), by="day")) %>%
  do(na.locf(.))

Edit: Thanks to Shree for reminding that tidyr::fill would eliminate need for zoo:

library(tidyverse)

data %>%
  complete(thedate = seq.Date(min(thedate), max(thedate), by="day")) %>%
  fill(gdp)

Upvotes: 2

Related Questions