Reputation: 203
I'm trying to compare two different prediction models by calculating mean squared prediction error.
Here is the link to the dataset earnings.csv dataset
Use 2000-2006 years of data as the training dataset and 2007-2008 data as the testing dataset.
Use both decompose() and stl() function in R to decompose the time series into trend, seasonal and error components.
Use lm() function in R to fit a linear model to the trend component. Then predict the monthly earnings for the year of 2007-2008.Compare the predictions under the two methods with the test dataset, both graphically and by calculating the sample mean squared prediction error (MSPE).
Discuss about the stochastic properties of the error term of the two different decompose methods.
Here is my R code:
Data <-read.table("earnings.csv",header = T,sep=",")
tsData <- ts(Data$X,start = 2000, frequency = 12)
plot(tsData,xlab= "Month",ylab = "Earnings")
tsData = log(tsData)
trainingSet = window(tsData,start=2000,end=c(2006,12))
testSet = window(tsData,start=2007,end=c(2008,12))
decompTS =decompose(trainingSet)
stltraining = stl(trainingSet,s.window = "periodic")
lm1 = lm(trainingSet~decompTS$trend)
lm1
lm2 = lm(trainingSet~stltraining$time.series[,2])
lm2
decompTest=(decompose(testSet))$trend
pred1=lm1$coefficients*decompTest
pred2=predict(lm2,decompTest)
plot(pred1)
plot(pred2)
mspe1=mean((testSet-pred1)^2)
The value I got for mspe1 is null. What did I do wrong here? Thanks for any help
Upvotes: 0
Views: 3512
Reputation: 23109
While not very sure about the theoretical soundness of your approaches, tried to fix the issues with your code. Main issue is converting from ts
object and back, also removing the NA values resulted out of prediction.
Data <-read.table("earnings.csv",header = T,sep=",")
tsData <- ts(Data$X,start = 2000, frequency = 12)
plot(tsData,xlab= "Month",ylab = "Earnings")
tsData = log(tsData)
trainingSet = window(tsData,start=2000,end=c(2006,12))
testSet = window(tsData,start=2007,end=c(2008,12))
decompTS =decompose(trainingSet)
stltraining = stl(trainingSet,s.window = "periodic")
x <- as.numeric(decompTS$trend)
y <- as.numeric(trainingSet) # convert the data from ts to numeric before lm
lm1 = lm(y~x)
lm1
#Call:
#(Intercept) x
# -0.01082 1.00679
x <- stltraining$time.series[,2]
lm2 = lm(y~x)
lm2
#(Intercept) x
# 0.0022 0.9984
decompTest=as.numeric((decompose(testSet))$trend)
pred1=predict(lm1,data.frame(x=decompTest)) #lm1$coefficients*decompTest
pred2=predict(lm2,data.frame(x=decompTest))
pred1TS = window(ts(pred1,start = 2007, frequency = 12),start=2007,end=c(2008,12)) # convert back to ts, for plotting
pred2TS = window(ts(pred2,start = 2007, frequency = 12),start=2007,end=c(2008,12))
library(xts)
plot(as.xts(testSet), main='TestSet')
lines(as.xts(pred1TS), col='red', pch=19, lwd=2)
lines(as.xts(pred2TS), col='green', pch=19, lwd=2)
mspe1=mean((as.numeric(testSet)-pred1)^2, na.rm=TRUE)
# [1] 0.01865166
mspe2=mean((as.numeric(testSet)-pred2)^2, na.rm=TRUE)
# [1] 0.018508
Upvotes: 0