Sahithi Visvanathan
Sahithi Visvanathan

Reputation: 29

Accounting for zeros in Time Series data

Hi I am transforming my current data set into time series by dividing it into weeks and counts for each week. Currently the weeks where the counts are zero its omitting in the data set for example. Below is the example data set. I want to include all the weeks with counts zero as well. How do i transform it using dplyr.

2013-10-6 1
2014-08-03 1
2014-10-12 1

Upvotes: 1

Views: 861

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 270045

1) dplyr Assuming the input is DF shown reproducibly in the Note at the end right join DF to a data frame of weeks and then replace NA values with zero. In the data frame in the Note at the end value is integer but if it is double in your application then use 0 in place of 0L.

library(dplyr)

DF %>%
   right_join(data.frame(date = seq(first(.$date), last(.$date), 7))) %>%
   mutate(value = coalesce(value, 0L))

2) zoo Since this is a time series many calculations can be simplified by using a time series representation. Converting to zoo object z we can use this one-liner which converts from zoo to ts and back which has the effect of inserting NA values. Since ts class cannot represent dates directly we use aggregate to convert back to Date and then na.fill to replace the NAs with 0.

library(zoo)
z <- read.zoo(DF, frequency = 7)

na.fill(aggregate(as.zoo(as.ts(z)), as.Date, c), 0)

This could also be written as:

library(magrittr)

z %>% as.ts %>% as.zoo %>% aggregate(as.Date, c) %>% na.fill(0)

Note

Lines <- "2013-10-6 1
2014-08-03 1
2014-10-12 1"
DF <- read.table(text = Lines, col.names = c("date", "value"))
DF$date <- as.Date(DF$date)

Upvotes: 1

www
www

Reputation: 39174

We can use complete and full_seq from the tidyr package.

library(dplyr)
library(tidyr)

dat2 <- dat %>%
  mutate(V1 = as.Date(V1)) %>%
  complete(V1 = full_seq(V1, period = 7), fill = list(V2 = 0))
dat2
# # A tibble: 54 x 2
#    V1            V2
#    <date>     <dbl>
#  1 2013-10-06     1
#  2 2013-10-13     0
#  3 2013-10-20     0
#  4 2013-10-27     0
#  5 2013-11-03     0
#  6 2013-11-10     0
#  7 2013-11-17     0
#  8 2013-11-24     0
#  9 2013-12-01     0
# 10 2013-12-08     0
# # ... with 44 more rows

Data

dat <- read.table(text = "'2013-10-6' 1
'2014-08-03' 1
                  '2014-10-12' 1",
                  header = FALSE, stringsAsFactors = FALSE)

Upvotes: 1

Related Questions