Reputation: 27
I have a daily time series data set that I am attempting to perform an hourly linear interpolation on. My code is performing a linear interpolation between points but I need to start from 0 after the last point per ID and start of the new day.
Below is my output after adding missing hours to the daily raw data
Below is my output after performing the following code, but I am not sure how to start from 0:
dfYPO0_20171 <- dfYPO0_2017
%>% mutate(TIMESTAMP = as.POSIXct(as.character(TIMESTAMP)))
%>% group_by(ID)
%>% do(left_join(data.frame(ID= .$ID[1], TIMESTAMP = seq(min(.$TIMESTAMP), max(.$TIMESTAMP), by = "hour")), ., by=c("ID", "TIMESTAMP")))
%>% mutate(CALC_HOURLY_PROD= na.approx(.$"Total Prod Yest"))
Below is what I would like my output to look like:
Thanks for any help in advance!
Upvotes: 0
Views: 681
Reputation: 93871
Here's an approach using tidyverse
packages. First, we'll create groups based on the runs of missing values, then we'll use approx
to interpolate.
library(tidyverse)
# Fake data
dat = data.frame(time=seq(as.Date("2015-01-01"), as.Date("2015-01-31"), "1 day"),
prod=c(10.4, rep(NA,19), 25.8, rep(NA,9), 14.2))
dat = dat %>%
# Create groups based on runs of NA followed by a value
mutate(group = rev(cumsum(!is.na(rev(prod))))) %>%
# Operate on the groups we just created
group_by(group) %>%
# First, add a zero at the beginning of each group, then run the approx function
# to interpolate values for all groups of length greater than 1
mutate(prod = replace(prod, row_number()==1 & n()>1, 0),
prod = if(n()>1) approx(time, prod, xout=time)$y else prod) %>%
ungroup %>% select(-group)
time prod 1 2015-01-01 10.400000 2 2015-01-02 0.000000 3 2015-01-03 1.357895 ... 19 2015-01-19 23.084211 20 2015-01-20 24.442105 21 2015-01-21 25.800000 22 2015-01-22 0.000000 23 2015-01-23 1.577778 24 2015-01-24 3.155556 ... 29 2015-01-29 11.044444 30 2015-01-30 12.622222 31 2015-01-31 14.200000
Upvotes: 2