Format historical data for forecasting with calendar variables

Question

I have hourly time series data for the year 2015. This data corresponds to power consumption of a big commercial building. I want to use this data to predict the usage for the year 2016. To develop a forecasting model, I need to format this data in a suitable format.

I am planning to use following features to predict the 2016 usage: (1) day of week, (2) time of the day (3) temperature, (4) year 2015 usage. I am able to create the first 3 features but the fourth one seems tricky.

How should I arrange the 2015 data so that for a particular day of 2016 I can use the corresponding day data of year 2015. My concern is :

I should not use the weekend day data of 2015 to predict the usage of working day
There are some days in 2015, where data is missing for entire day data. For the corresponding day in 2016, how should I account for these missing readings

Here, I have created dummy data corresponding to the year 2015 and 2016.

library(xts)
set.seed(123)
seq1 <- seq(as.POSIXct("2015-01-01"),as.POSIXct("2015-12-31"), by = "hour")
data1 <- xts(rnorm(length(seq1),150,5),seq1)
seq2 <- seq(as.POSIXct("2016-01-01"),as.POSIXct("2016-09-30"), by = "hour")
data2 <- xts(rnorm(length(seq2),140,5),seq2)

Let me give an example to clarify my problem:

Suppose model is: lm( output ~ dayofweek + timeofday + temperature + lastyearusage, data = xxx)
Now suppose I want to predict the usage on 2 oct 2016(dayY), using the lastyearusage onm2 oct 2015(dayX). In this step, issue is 1) How should I ensure thatdayX is not a weekend day if dayY is a working day. I am sure that in this case if I use dayX to predict dayY without keeping a check on day type output will get messy.

Format historical data for forecasting with calendar variables

Answers (1)

Related Questions