Reputation: 31
I am trying to convert quarterly data into daily data by adding the previous values on the missing dates. This data consists of financial ratios of different stocks. My data has a row label built from two columns: ticker and date. Since I have repetitive dates for each stock, I am not sure how to ignore the ticker and repopulate the missing dates with previous values.
Here is how a small sample of the data looks so far:
> df_new
de eps pe ps pb
APD 2015-09-30 1.373 1.6 21.463 2.772 3.784
APD 2015-12-31 1.325 1.68 21.284 2.893 3.805
APD 2016-03-31 1.411 -2.19 56.114 3.254 4.491
SWKS 2003-03-31 0.402 -0.04 <NA> <NA> <NA>
SWKS 2003-06-30 0.397 -0.04 -2.289 1.518 0.929
SWKS 2003-09-30 0.62 -0.29 -2.799 2.046 1.877
SWKS 2003-12-31 0.643 0.03 -25.426 2.045 1.905
SWKS 2004-03-31 0.657 -0.06 -32.004 2.641 2.579
SWKS 2004-06-30 0.584 0.09 -37.18 1.825 1.782
SWKS 2004-09-30 0.555 0.1 65.806 1.881 1.962
SWKS 2004-12-31 0.525 0.09 45.823 1.777 1.912
And I want it to look like this (if weekly):
> df_new
de eps pe ps pb
APD 2015-09-30 1.373 1.6 21.463 2.772 3.784
APD 2015-10-01 1.373 1.6 21.463 2.772 3.784
APD 2015-10-02 1.373 1.6 21.463 2.772 3.784
APD 2015-10-03 1.373 1.6 21.463 2.772 3.784
...
APD 2015-12-31 1.325 1.68 21.284 2.893 3.805
APD 2016-01-01 1.325 1.68 21.284 2.893 3.805
APD 2016-01-02 1.325 1.68 21.284 2.893 3.805
APD 2016-01-03 1.325 1.68 21.284 2.893 3.805
...
APD 2016-03-31 1.411 -2.19 56.114 3.254 4.491
APD 2016-04-01 1.411 -2.19 56.114 3.254 4.491
APD 2016-04-02 1.411 -2.19 56.114 3.254 4.491
APD 2016-04-03 1.411 -2.19 56.114 3.254 4.491
...
SWKS 2003-03-31 0.402 -0.04 <NA> <NA> <NA>
SWKS 2003-04-01 0.402 -0.04 <NA> <NA> <NA>
SWKS 2003-04-02 0.402 -0.04 <NA> <NA> <NA>
SWKS 2003-04-03 0.402 -0.04 <NA> <NA> <NA>
...
SWKS 2003-06-30 0.397 -0.04 -2.289 1.518 0.929
and so on...
I searched for answers and this link: Add missing xts/zoo data with linear interpolation in R is somewhat close to what I want. Though I am not sure what to do with ticker symbol.
Thank you so much for your help!
Upvotes: 1
Views: 114
Reputation: 270075
Use by
to apply the anonymous function shown to each symbol's rows. That function produces a grid g
of dates and merges it with the original rows of the data frame applying na.locf
to fill in NA values. Finally we use do.call("rbind", ...)
to put the "by"
object produced back together.
library(zoo) # na.locf
df <- do.call("rbind", by(df_new, df_new$symbol, function(x) {
rng <- range(x$date, na.rm = TRUE)
g <- data.frame(date = seq(rng[1], rng[2], "day"))
na.locf(merge(x, g, all = TRUE))
}))
Note: The input df_new
in reproducible form is:
Lines <- "
APD 2015-09-30 1.373 1.6 21.463 2.772 3.784
APD 2015-12-31 1.325 1.68 21.284 2.893 3.805
APD 2016-03-31 1.411 -2.19 56.114 3.254 4.491
SWKS 2003-03-31 0.402 -0.04 <NA> <NA> <NA>
SWKS 2003-06-30 0.397 -0.04 -2.289 1.518 0.929
SWKS 2003-09-30 0.62 -0.29 -2.799 2.046 1.877
SWKS 2003-12-31 0.643 0.03 -25.426 2.045 1.905
SWKS 2004-03-31 0.657 -0.06 -32.004 2.641 2.579
SWKS 2004-06-30 0.584 0.09 -37.18 1.825 1.782
SWKS 2004-09-30 0.555 0.1 65.806 1.881 1.962
SWKS 2004-12-31 0.525 0.09 45.823 1.777 1.912"
df_new <- read.table(text = Lines,
col.names = c("symbol", "date", "de", "eps", "pe", "ps", "pb"))
df_new$date <- as.Date(df_new$date)
Upvotes: 2