dartdog
dartdog

Reputation: 10862

CSV input to R Forecast with dates via R studio?

I have a very simple csv file I'm trying to experiment with different forecast methods on.

          Year   total UnemplRt
   1  12/31/2013    NA      7.1
   2  12/31/2012 39535      8.3
   3  12/31/2011 36965     10.0
   4  12/31/2010 36234     10.9
   5  12/31/2009 37918      8.5
   6  12/31/2008 42235      4.3
   7  12/31/2007 55698      3.7
   8  12/31/2006 58664      3.8
   9  12/31/2005 59674      4.7
   10 12/31/2004 51439      5.7 

When I import it using R studio I get this list. (above) which simply has the list name. and Col headers that I don't seem to be able to reference.

I am a total newbie at R, but I gather I should have a Dataframe and that the 1st column should be a date type. Don't know how to get there from here.. and then .. And is that the correct layout for input to forecast?

How to use forecast (Mutli-models) to use rows 10-4 to forecast "total" on 3 using the UnemplRt on 3 (which is known in advance and so on ie. 10-3 to forecast 2 and 10-2 to forecast 1) which of course will be the forecast for the upcoming year... I've got it working from a straight Linear Regression in a spreadsheet, but it is coming out too high, so I'm looking for methods that will factor recent data better and pay attention to the curve rather than just straight-line .

This is horribly simplistic but hopefully generic enough that others will find the answer useful as well.

Upvotes: 1

Views: 2072

Answers (1)

Jochem
Jochem

Reputation: 3387

I am not 100% sure what you are asking about, but I assume that you would like to create some time series model with some regression included in it. Below an overview of building a simple time series model and one with a regressor included.

# load the base data as presented in the question
Workbook1 <- structure(list(Year = structure(1:10, .Label = c("31-Dec-04", 
"31-Dec-05", "31-Dec-06", "31-Dec-07", "31-Dec-08", "31-Dec-09", 
"31-Dec-10", "31-Dec-11", "31-Dec-12", "31-Dec-13"), class = "factor"), 
    total = c(51439L, 59674L, 58664L, 55698L, 42235L, 37918L, 
    36234L, 36965L, 39535L, NA), UnemplRt = c(5.7, 4.7, 3.8, 
    3.7, 4.3, 8.5, 10.9, 10, 8.3, 7.1)), .Names = c("Year", "total", 
"UnemplRt"), class = "data.frame", row.names = c(NA, -10L))

# Make a time series out of the value
dependent <- ts(Workbook1[1:9,]$total, start=c(2004), frequency=1)

# load forecast package
require(forecast)

# make a model that fits, you can get other models as well. Think it is best to some studying of the forecast package documentation.
fit <- auto.arima(dependent)

# do the actual forecast
fcast <- forecast(fit)

# here some results of the forecast
fcast
     Point Forecast    Lo 80    Hi 80     Lo 95    Hi 95
2013          39535 31852.42 47217.58 27785.501 51284.50

# You can make a plot as following:
plot(fcast)

As you are including some unemployment rate figures I assume that you might want to include this in your forecast in some sort of a regression model. Below a model about how you can approach this:

# load independent variables in variables.
unemployment <- ts(Workbook1[1:9,]$UnemplRt, start=c(2004), frequency=1)
unemployment_future <- ts(Workbook1[10:10,]$UnemplRt, start=c(2004), frequency=1)

# make a model that fits the history
fit2 <- auto.arima(dependent, xreg=unemployment)

# generate a forecast with the already known unemployment rate for 2013.
fcast2 <- forecast(fit2,xreg=unemployment_future)

Here the result of the forecast, again you can make a plot of it as above.

fcast2
     Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
2013       45168.02 38848.92 51487.12 35503.79 54832.25

Hopes the above helps.

Upvotes: 6

Related Questions