Reputation: 147

Time series forecasting by lm() using lapply

I was trying to forecast a time series problem using lm() and my data looks like below

Customer_key  date         sales
 A35          2018-05-13   31
 A35          2018-05-20   20
 A35          2018-05-27   43
 A35          2018-06-03   31
 BH22         2018-05-13   60
 BH22         2018-05-20   67
 BH22         2018-05-27   78
 BH22         2018-06-03   55

Converted my df to a list format by

df <- dcast(df, date ~ customer_key,value.var = c("sales"))
df <- subset(df, select = -c(dt))
demandWithKey <- as.list(df)

Trying to write a function such that applying this function across all customers

my_fun <- function(x) {
  fit <- lm(ds_load ~ date, data=df) ## After changing to list ds_load and date column names
                                     ## are no longer available for formula
  fit_b <- forecast(fit$fitted.values, h=20) ## forecast using lm()
  return(data.frame(c(fit$fitted.values, fit_b[["mean"]])))
}

fcast <- lapply(df, my_fun)

I know the above function doesn't work, but basically I'm looking for getting both the fitted values and forecasted values for a grouped data. But I've tried all other methods using tslm() (converting into time series data) and so on but no luck I can get the lm() work somehow on just one customer though. Also many questions/posts were on just fitting the model but I would like to forecast too at same time.

Upvotes: 1

Answers (2)

jay.sf

Reputation: 72633

I don't know what you're up to exactly, but you could make this less complicated.

Using by avoids the need to reshape your data, it splits your data e.g. by customer ID as in your case and applies a function on the subsets (i.e. it's a combination of split and lapply; see ?by).

Since you want to compare fitted and forecasted values somehow in your result, you probably need predict rather than $fitted.values, otherwise the values won't be of same length. Because your independent variable is a date in weekly intervals, you may use seq.Date and take the first date as a starting value; the sequence has length actual values (nrow each customer) plus h= argument of the forecast.

For demonstration purposes I add the fitted values as first column in the following.

res <- by(dat, dat$cus_key, function(x) {
      H <- 20  ## globally define 'h'
      fit <- lm(sales ~ date, x)
      fitted <- fit$fitted.values
      pred <- predict(fit, newdata=data.frame(
        date=seq(x$date[1], length.out= nrow(x) + H, by="week")))
      fcst <- c(fitted, forecast(fitted, h=H)$mean)
      fit.na <- `length<-`(unname(fitted), length(pred))  ## for demonstration
      return(cbind(fit.na, pred, fcst))
      })

Result

res
# dat$cus_key: A28
#    fit.na  pred  fcst
# 1    41.4  41.4  41.4
# 2    47.4  47.4  47.4
# 3    53.4  53.4  53.4
# 4    59.4  59.4  59.4
# 5    65.4  65.4  65.4
# 6      NA  71.4  71.4
# 7      NA  77.4  77.4
# 8      NA  83.4  83.4
# 9      NA  89.4  89.4
# 10     NA  95.4  95.4
# 11     NA 101.4 101.4
# 12     NA 107.4 107.4
# 13     NA 113.4 113.4
# 14     NA 119.4 119.4
# 15     NA 125.4 125.4
# 16     NA 131.4 131.4
# 17     NA 137.4 137.4
# 18     NA 143.4 143.4
# 19     NA 149.4 149.4
# 20     NA 155.4 155.4
# 21     NA 161.4 161.4
# 22     NA 167.4 167.4
# 23     NA 173.4 173.4
# 24     NA 179.4 179.4
# 25     NA 185.4 185.4
# ---------------------------------------------------------------- 
# dat$cus_key: B16
#    fit.na pred fcst
# 1    49.0 49.0 49.0
# 2    47.7 47.7 47.7
# 3    46.4 46.4 46.4
# 4    45.1 45.1 45.1
# 5    43.8 43.8 43.8
# 6      NA 42.5 42.5
# 7      NA 41.2 41.2
# 8      NA 39.9 39.9
# 9      NA 38.6 38.6
# 10     NA 37.3 37.3
# 11     NA 36.0 36.0
# 12     NA 34.7 34.7
# 13     NA 33.4 33.4
# 14     NA 32.1 32.1
# 15     NA 30.8 30.8
# 16     NA 29.5 29.5
# 17     NA 28.2 28.2
# 18     NA 26.9 26.9
# 19     NA 25.6 25.6
# 20     NA 24.3 24.3
# 21     NA 23.0 23.0
# 22     NA 21.7 21.7
# 23     NA 20.4 20.4
# 24     NA 19.1 19.1
# 25     NA 17.8 17.8
# ---------------------------------------------------------------- 
# dat$cus_key: C12
#    fit.na  pred  fcst
# 1    56.4  56.4  56.4
# 2    53.2  53.2  53.2
# 3    50.0  50.0  50.0
# 4    46.8  46.8  46.8
# 5    43.6  43.6  43.6
# 6      NA  40.4  40.4
# 7      NA  37.2  37.2
# 8      NA  34.0  34.0
# 9      NA  30.8  30.8
# 10     NA  27.6  27.6
# 11     NA  24.4  24.4
# 12     NA  21.2  21.2
# 13     NA  18.0  18.0
# 14     NA  14.8  14.8
# 15     NA  11.6  11.6
# 16     NA   8.4   8.4
# 17     NA   5.2   5.2
# 18     NA   2.0   2.0
# 19     NA  -1.2  -1.2
# 20     NA  -4.4  -4.4
# 21     NA  -7.6  -7.6
# 22     NA -10.8 -10.8
# 23     NA -14.0 -14.0
# 24     NA -17.2 -17.2
# 25     NA -20.4 -20.4

As you can see, prediction and forecast yield the same values, since both methods are based on the same single explanatory variable date in this case.

Toy data:

set.seed(42)
dat <- transform(expand.grid(cus_key=paste0(LETTERS[1:3], sample(12:43, 3)),
                             date=seq.Date(as.Date("2018-05-13"), length.out=5, by="week")),
                 sales=sample(20:80, 15, replace=TRUE))

Upvotes: 0

nadjib amrouche

Reputation: 41

lm() is for a regression model but here you have a time serie so for forecasting the serie you have to use one of the time serie model (ARMA ARCH GARCH...) so you can use the function in r : auto.arima() in "forecast" package

Upvotes: 0

Time series forecasting by lm() using lapply

Answers (2)

Result

Related Questions