MasterShifu
MasterShifu

Reputation: 233

Subsetting a timeseries changes date format in R

I am tried to use an ARIMA model on my monthly timeseries data. But I need to subset the timeseries from March - December every year. I used the subset() function to do that, but it is causing a weird change in the dateformat and also the forecast.

When I remove the subset() code and run the model, the forecast output looks like this: enter image description here

But when I use subset(), the forecast output changes to: enter image description here

measure.ts=ts(measure.df[,3],start = c(2017,1),frequency = 12)
  measure.ts = subset(measure.ts, month = c(3:12) )

  train <- head(measure.ts, 0.77 * length(measure.ts))
  test <- tail(measure.ts, 0.23 * length(measure.ts))
  fit <- arima(train, c(1,1,0),  seasonal = list(order = c(1,1,0), period = 12), method = "ML" )
  fcast <-forecast(fit,h=12)

I added measure.ts$ReportDate <- as.Date(measure.ts$ReportDate, format = "%m/%d/%Y") after subset() but I get an error of "$ operator is invalid for atomic vectors"

Upvotes: 0

Views: 129

Answers (1)

akrun
akrun

Reputation: 886948

The measure.ts is a ts object and not a data.frame. So, the $, [ won't work on it. To index/time can extract the time attributes in a numeric format. It may be more convenient to convert to xts object

library(xts)   
xt1 <- as.xts(measure.ts) 
xt2 <- subset(xt1, month(time(xt1)) > 3)
head(xt2)
#               [,1]
#Apr 2017 -0.5836272
#May 2017  0.8474600
#Jun 2017  0.2660220
#Jul 2017  0.4445853
#Aug 2017 -0.4664951
#Sep 2017 -0.8483700

Using this data, we can forecast

train <- head(xt2, 0.77 * length(xt2))
test <- tail(xt2, 0.23 * length(xt2))
fit <- arima(train, c(1,1,0),  seasonal = list(order = c(1,1,0),
  period = 12), method = "ML" )
fcast <-forecast(fit,h=12)

The post is confusing as the image showed seems to be the forecast output and the code in subset is for changing the ts object. The 'fcast' is a list. Needs to extract the components.

fcst2 <- as.xts(fcast$mean)
index(fcst2) <- tail(time(test), 1) + seq(0.1, length.out = 12, by = .1)
fcst2
#                [,1]
#May 2025 -1.26601790
#Jun 2025 -0.83948223
#Jul 2025 -0.54097346
#Aug 2025 -2.27437406
#Oct 2025 -2.63901417
#Nov 2025 -1.71837725
#Dec 2025  0.05099788
#Jan 2026 -2.49037930
#Feb 2026 -1.29732565
#Apr 2026 -2.23682676
#May 2026 -1.81801742
#Jun 2026 -1.63090599

data

set.seed(24)
measure.ts=ts(rnorm(100),start = c(2017,1),frequency = 12)

Upvotes: 1

Related Questions