Missing values - Arima model

Question

I have a daily time series about the sales of a product, my series start from 01/01/2016 until 31/08/2017.

Considering that it is a six-day week (my week starts on Monday and ends Saturday) there is no data for Sundays, I understand that before running an Arima model I need first to fill the missing values. This is where I need help: I've read that I can fill the missing values with na.approx or NA, but I do not know how to do that.

You could see my series here:

https://drive.google.com/file/d/0BzIf8XvzKOGWSm1ucUdYUVhfVGs/view?usp=sharing

As you can see, there is no data for Sundays. I need to know how to fill the missing values to run an Arima model and be able to forecast what's left of 2017.

acylam · Accepted Answer

Here're three ways of doing it:

library(lubridate)
library(xts)
library(dplyr)
library(forecast)

df$Date = mdy(df$Date)

Removing Sundays:

ts_no_sunday = df %>%
  filter(wday(df$Date) != 1) %>%
  {xts(.$Units, .$Date)}

plot(ts_no_sunday)

no_sunday_arima = auto.arima(ts_no_sunday)

plot(forecast(no_sunday_arima, h = 10))

Replace Sundays with NAs:

ts_sunday = df %>%
  mutate(Units = replace(Units, which(wday(df$Date) == 1), NA)) %>%
  {xts(.$Units, .$Date)}

plot(ts_sunday)

sunday_arima = auto.arima(ts_sunday)

plot(forecast(sunday_arima, h = 10))

Interpolate Sundays:

ts_interp = df %>%
  mutate(Units = replace(Units, which(wday(df$Date) == 1), NA),
         Units = na.approx(Units)) %>%
  {xts(.$Units, .$Date)}

plot(ts_interp)

interp_arima = auto.arima(ts_interp)

plot(forecast(interp_arima, h = 10))

Notes:

As one can see, they produce different forecasts. This is because the first time series is irregular, the second is a regular time series with missing values, and the third is a regular time series with interpolated data. In my opinion, a better way to deal with missing values is to interpolate before fitting an ARIMA, since ARIMA assumes that the time series is regularly spaced. This however, also depends on whether your "missing" data points are actually missing, and not a stop in activity. The former should be treated with interpolation, while for the latter you might just be better off removing Sundays and treat the time series as if Sundays don't exist.

See this discussion on How to handle nonexistent or missing data? and this on Using the R forecast package with missing values and/or irregular time series

Missing values - Arima model

Answers (2)

Related Questions