DeathbyGreen
DeathbyGreen

Reputation: 347

R Prophet add_regressor giving strange results

I'm trying (for the first time) to add an external variable to prophet with the add_regressor function, but the results I'm getting look wild. The dataset I'm using is freely available on kaggle (the well known shampoo sales) here. I'm attempting to use freely available data for the SPY stock index using R's quantmod package as my external variable.

Here's how I start the code:

library(prophet)
library(quantmod)
library(dplyr)


df <- read.csv("~/shampoo.csv")


#now get the min and max dates in the column
min_date <- min(df$Date, na.rm = TRUE)
max_date <- max(df$Date, na.rm = TRUE)


#download the SPY stock data
getSymbols("SPY", from  = min_date, to = max_date)

#SPY closes stored into a df and massage a bit
Close <- data.frame(Cl(SPY))
Close <- cbind(ds = rownames(Close), Close)
rownames(Close) <- NULL
Close_no_rename <- Close
colnames(Close)[colnames(Close) == 'SPY.Close'] <- 'y'
colnames(Close_no_rename)[colnames(Close_no_rename) == 'SPY.Close'] <- 'SPY_CLOSE'


#now put this into prophet and make a forecast for the forecast_period for SPY
stock_model <- prophet(Close)


#make a forecast dataframe
future_stocks <- make_future_dataframe(stock_model, periods = 30, freq = "month", include_history = FALSE)

#the below df will have predicted stock prices of the SPY. want to extract the future y values as point forecast along with dates
forecast <- predict(stock_model, future_stocks) %>% select(ds, yhat)
colnames(forecast)[colnames(forecast) == 'yhat'] <- 'SPY_CLOSE'


#rename the columns of the actual df
colnames(df)[colnames(df) == 'Date'] <- 'ds'
colnames(df)[colnames(df) == 'Value'] <- 'y'

#now want to merge the Close df y historic values onto the training df, merge by date ds column
df_historic_with_SPY_close <- merge(df, Close_no_rename, by = "ds")


#now actually forecast using prophet
model <- prophet()

#add the SPY regressor
model <- add_regressor(model, 'SPY_CLOSE', prior.scale = 0.0000001, standardize = FALSE)
model <- fit.prophet(model, df_historic_with_SPY_close)


forecast_final <- predict(model, forecast)

plot(model, forecast_final)

This does not throw any errors but the plot of the forecast looks...wrong. It looks as if the scale is off or something. I tried fiddling with the prior and standardize settings with no luck. Thanks for any help! enter image description here

Here is the shampoo dataset being used as the main variable:

Date    Value
2017-01-01  266
2017-02-01  145.9
2017-03-01  183.1
2017-04-01  119.3
2017-05-01  180.3
2017-06-01  168.5
2017-07-01  231.8
2017-08-01  224.5
2017-09-01  192.8
2017-10-01  122.9
2017-11-01  336.5
2017-12-01  185.9
2018-01-01  194.3
2018-02-01  149.5
2018-03-01  210.1
2018-04-01  273.3
2018-05-01  191.4
2018-06-01  287
2018-07-01  226
2018-08-01  303.6
2018-09-01  289.9
2018-10-01  421.6
2018-11-01  264.5
2018-12-01  342
2019-01-01  339.7
2019-02-01  440.4
2019-03-01  315.9
2019-04-01  439.3
2019-05-01  401.3
2019-06-01  437.4
2019-07-01  575.5
2019-08-01  407.6
2019-09-01  682
2019-10-01  475.3
2019-11-01  581.3
2019-12-01  646.9

Upvotes: 3

Views: 1456

Answers (1)

Grubbmeister
Grubbmeister

Reputation: 877

I think I have fixed things, but the only thing I did differently was change the dates from factor format to date format, and tell R to use the select function from dplyr. I also ran R without any other packages loaded. So, it's still a bit of a mystery why this worked.

I tumbled to this issue when the line

df_historic_with_SPY_close <- merge(df, Close_no_rename, by = "ds")

didn't work properly. I discovered it was because I had formatted the df dates as dates to get them to work with getSymbols but then they were different from Close_no_rename.

First, the data I used:

df<-dput(df)
structure(list(ds = structure(c(17167, 17198, 17226, 17257, 17287, 
17318, 17348, 17379, 17410, 17440, 17471, 17501, 17532, 17563, 
17591, 17622, 17652, 17683, 17713, 17744, 17775, 17805, 17836, 
17866, 17897, 17928, 17956, 17987, 18017, 18048, 18078, 18109, 
18140, 18170, 18201, 18231), class = "Date"), y = c(266, 145.9, 
183.1, 119.3, 180.3, 168.5, 231.8, 224.5, 192.8, 122.9, 336.5, 
185.9, 194.3, 149.5, 210.1, 273.3, 191.4, 287, 226, 303.6, 289.9, 
421.6, 264.5, 342.3, 339.7, 440.4, 315.9, 439.3, 401.3, 437.4, 
575.5, 407.6, 682, 475.3, 581.3, 646.9)), row.names = c(NA, -36L
), class = "data.frame")


    library(prophet)
    library(quantmod)
    library(dplyr)
    
# can use your df, rather than above
    df<-read.csv("~/shampoo.csv")
# either way, should run this
    df$Date<-as.Date.factor(df$Date,tryFormats = c("%d-%m-%y"))
    str(df) #check
    
    #now get the min and max dates in the column
    min_date <- min(df$Date, na.rm = TRUE)
    max_date <- max(df$Date, na.rm = TRUE)
    
    
    #download the SPY stock data
    getSymbols("SPY", from  = min_date, to = max_date)
    
    #SPY closes stored into a df and massage a bit
    Close <- data.frame(Cl(SPY))
    Close <- cbind(ds = rownames(Close), Close)
    rownames(Close) <- NULL
    Close_no_rename <- Close
    colnames(Close)[colnames(Close) == 'SPY.Close'] <- 'y'
    colnames(Close_no_rename)[colnames(Close_no_rename) == 'SPY.Close'] <- 'SPY_CLOSE'

# make dates in date format 

    Close_no_rename$ds<-as.Date(Close_no_rename$ds)
    str(Close_no_rename)
    
    #now put this into prophet and make a forecast for the forecast_period for SPY
    stock_model <- prophet(Close)
    
    
    #make a forecast dataframe
    future_stocks <- make_future_dataframe(stock_model, periods = 30, freq = "month", include_history = FALSE)
    
    #the below df will have predicted stock prices of the SPY. want to extract the future y values as point forecast along with dates
# specify dplyr:::select
    forecast <- predict(stock_model, future_stocks) %>% dplyr:::select(ds, yhat)
    colnames(forecast)[colnames(forecast) == 'yhat'] <- 'SPY_CLOSE'
    
    
    #rename the columns of the actual df
    colnames(df)[colnames(df) == 'Date'] <- 'ds'
    colnames(df)[colnames(df) == 'Value'] <- 'y'
    
    #now want to merge the Close df y historic values onto the training df, merge by date ds column
    df_historic_with_SPY_close <- merge(df, Close_no_rename, by = "ds")
    df_historic_with_SPY_close
    
    #now actually forecast using prophet
    model <- prophet()
    
    #add the SPY regressor
    model <- add_regressor(model, 'SPY_CLOSE', prior.scale = 0.0000001, standardize = FALSE)
    model <- fit.prophet(model, df_historic_with_SPY_close)
    
    
    forecast_final <- predict(model, forecast)
    
    plot(model, forecast_final)

Result:

enter image description here

EDIT

Using the following dataset, with dates changed to one that is close in the SPY dataset:

df<-dput(df)
structure(list(ds = structure(c(17169, 17198, 17226, 17259, 17287, 
17318, 17350, 17379, 17410, 17442, 17471, 17501, 17534, 17563, 
17591, 17624, 17652, 17683, 17715, 17744, 17778, 17805, 17836, 
17868, 17898, 17928, 17956, 17987, 18017, 18050, 18078, 18109, 
18142, 18170, 18201, 18232), class = "Date"), y = c(266, 145.9, 
183.1, 119.3, 180.3, 168.5, 231.8, 224.5, 192.8, 122.9, 336.5, 
185.9, 194.3, 149.5, 210.1, 273.3, 191.4, 287, 226, 303.6, 289.9, 
421.6, 264.5, 342.3, 339.7, 440.4, 315.9, 439.3, 401.3, 437.4, 
575.5, 407.6, 682, 475.3, 581.3, 646.9)), class = "data.frame", row.names = c(NA, 
-36L))

We get this, which looks much better:

enter image description here

EDIT 2

The problem is to do with missing data. Some dates in the shampoo dataset are not in the SPY dataset. The following will select the data from the nearest date in the SPY dataset to overcome the problem of missing data. However, the graph it generates still looks odd, and changing the dates a little seems to be cause of the problem.

Replacing the line:

df_historic_with_SPY_close <- merge(df, Close_no_rename, by = "ds")

With (credit to https://stackoverflow.com/a/28073823/7967291):

library(data.table)
setDT(Close_no_rename)
setDT(df)

setkey(Close_no_rename, ds)[, dateMatch:=ds]
df_historic_with_SPY_close<-Close_no_rename[df, roll='nearest']
df_historic_with_SPY_close<-setDT(df_historic_with_SPY_close)[,-1]
names(df_historic_with_SPY_close)[names(df_historic_with_SPY_close) == "dateMatch"] <- "ds"
df_historic_with_SPY_close
df_historic_with_SPY_close <- mutate (  df_historic_with_SPY_close,  ds = ymd(ds) )
str(df_historic_with_SPY_close)

Upvotes: 1

Related Questions