user59036
user59036

Reputation: 203

Assess prediction models by calculating mean squared prediction error

I'm trying to compare two different prediction models by calculating mean squared prediction error.

Here is the link to the dataset earnings.csv dataset

  1. Use 2000-2006 years of data as the training dataset and 2007-2008 data as the testing dataset.

  2. Use both decompose() and stl() function in R to decompose the time series into trend, seasonal and error components.

  3. Use lm() function in R to fit a linear model to the trend component. Then predict the monthly earnings for the year of 2007-2008.Compare the predictions under the two methods with the test dataset, both graphically and by calculating the sample mean squared prediction error (MSPE).

  4. Discuss about the stochastic properties of the error term of the two different decompose methods.

Here is my R code:

Data <-read.table("earnings.csv",header = T,sep=",")
tsData <- ts(Data$X,start = 2000, frequency = 12)
plot(tsData,xlab= "Month",ylab = "Earnings")
tsData = log(tsData)

trainingSet = window(tsData,start=2000,end=c(2006,12))
testSet = window(tsData,start=2007,end=c(2008,12))
decompTS =decompose(trainingSet)
stltraining = stl(trainingSet,s.window = "periodic")

lm1 = lm(trainingSet~decompTS$trend)
lm1

lm2 = lm(trainingSet~stltraining$time.series[,2])
lm2


decompTest=(decompose(testSet))$trend
pred1=lm1$coefficients*decompTest
pred2=predict(lm2,decompTest)
plot(pred1)
plot(pred2)

mspe1=mean((testSet-pred1)^2)

The value I got for mspe1 is null. What did I do wrong here? Thanks for any help

Upvotes: 0

Views: 3512

Answers (1)

Sandipan Dey
Sandipan Dey

Reputation: 23109

While not very sure about the theoretical soundness of your approaches, tried to fix the issues with your code. Main issue is converting from ts object and back, also removing the NA values resulted out of prediction.

Data <-read.table("earnings.csv",header = T,sep=",")
tsData <- ts(Data$X,start = 2000, frequency = 12)
plot(tsData,xlab= "Month",ylab = "Earnings")
tsData = log(tsData)

trainingSet = window(tsData,start=2000,end=c(2006,12))
testSet = window(tsData,start=2007,end=c(2008,12))
decompTS =decompose(trainingSet)
stltraining = stl(trainingSet,s.window = "periodic")

x <- as.numeric(decompTS$trend) 
y <- as.numeric(trainingSet) # convert the data from ts to numeric before lm
lm1 = lm(y~x)
lm1
#Call:
#(Intercept)            x  
#   -0.01082      1.00679  

x <- stltraining$time.series[,2]
lm2 = lm(y~x)
lm2
#(Intercept)            x  
# 0.0022       0.9984  

decompTest=as.numeric((decompose(testSet))$trend)
pred1=predict(lm1,data.frame(x=decompTest)) #lm1$coefficients*decompTest
pred2=predict(lm2,data.frame(x=decompTest))

pred1TS = window(ts(pred1,start = 2007, frequency = 12),start=2007,end=c(2008,12)) # convert back to ts, for plotting
pred2TS = window(ts(pred2,start = 2007, frequency = 12),start=2007,end=c(2008,12))
library(xts)
plot(as.xts(testSet), main='TestSet')
lines(as.xts(pred1TS), col='red', pch=19, lwd=2)
lines(as.xts(pred2TS), col='green', pch=19, lwd=2)

enter image description here

mspe1=mean((as.numeric(testSet)-pred1)^2, na.rm=TRUE)
# [1] 0.01865166
mspe2=mean((as.numeric(testSet)-pred2)^2, na.rm=TRUE)
# [1] 0.018508

Upvotes: 0

Related Questions