Reputation: 29
Hi I am transforming my current data set into time series by dividing it into weeks and counts for each week. Currently the weeks where the counts are zero its omitting in the data set for example. Below is the example data set. I want to include all the weeks with counts zero as well. How do i transform it using dplyr.
2013-10-6 1
2014-08-03 1
2014-10-12 1
Upvotes: 1
Views: 861
Reputation: 270045
1) dplyr Assuming the input is DF
shown reproducibly in the Note at the end right join DF
to a data frame of weeks and then replace NA values with zero. In the data frame in the Note at the end value
is integer but if it is double in your application then use 0 in place of 0L.
library(dplyr)
DF %>%
right_join(data.frame(date = seq(first(.$date), last(.$date), 7))) %>%
mutate(value = coalesce(value, 0L))
2) zoo Since this is a time series many calculations can be simplified by using a time series representation. Converting to zoo object z
we can use this one-liner which converts from zoo to ts and back which has the effect of inserting NA values. Since ts class cannot represent dates directly we use aggregate to convert back to Date and then na.fill to replace the NAs with 0.
library(zoo)
z <- read.zoo(DF, frequency = 7)
na.fill(aggregate(as.zoo(as.ts(z)), as.Date, c), 0)
This could also be written as:
library(magrittr)
z %>% as.ts %>% as.zoo %>% aggregate(as.Date, c) %>% na.fill(0)
Lines <- "2013-10-6 1
2014-08-03 1
2014-10-12 1"
DF <- read.table(text = Lines, col.names = c("date", "value"))
DF$date <- as.Date(DF$date)
Upvotes: 1
Reputation: 39174
We can use complete
and full_seq
from the tidyr
package.
library(dplyr)
library(tidyr)
dat2 <- dat %>%
mutate(V1 = as.Date(V1)) %>%
complete(V1 = full_seq(V1, period = 7), fill = list(V2 = 0))
dat2
# # A tibble: 54 x 2
# V1 V2
# <date> <dbl>
# 1 2013-10-06 1
# 2 2013-10-13 0
# 3 2013-10-20 0
# 4 2013-10-27 0
# 5 2013-11-03 0
# 6 2013-11-10 0
# 7 2013-11-17 0
# 8 2013-11-24 0
# 9 2013-12-01 0
# 10 2013-12-08 0
# # ... with 44 more rows
Data
dat <- read.table(text = "'2013-10-6' 1
'2014-08-03' 1
'2014-10-12' 1",
header = FALSE, stringsAsFactors = FALSE)
Upvotes: 1