R forecastML package keeps renaming outcome columns

I am trying to use the forecast ML r package to run some tests but the moment I hit this step, it renames the columns

data <- read.csv("C:\\Users\\User\\Desktop\\DG ST Forecast\\LassoTemporalForecast.csv", header=TRUE)
date_frequency <- "1 week"
dates <- seq(as.Date("2012-10-05"), as.Date("2020-10-05"), by = date_frequency)
data_train <- data[1:357,]
data_test <- data[358:429,]

outcome_col <- 1  # The column index of our DriversKilled outcome.

horizons <- c(1,2,3,4,5,6,7,8,9,10,11,12)  # 4 models that forecast 1, 1:3, 1:6, and 1:12 time steps ahead.

# A lookback across select time steps in the past. Feature lags 1 through 9, for instance, will be 
# silently dropped from the 12-step-ahead model.
lookback <- c(1)

# A non-lagged feature that changes through time whose value we either know (e.g., month) or whose 
# value we would like to forecast.
dynamic_features <- colnames(data_train)

data_list <- forecastML::create_lagged_df(data_train,
                                          type = "train",
                                          outcome_col = 1,
                                          horizons = horizons,
                                          lookback = lookback,
                                          date = dates[1:nrow(data_train)],
                                          frequency = date_frequency,
                                          dynamic_features = colnames(data_train)
)

After the data_list, here is a snapshot of what happens in the console:

enter image description here

Next, when I try to create windows following the name change,

windows <- forecastML::create_windows(lagged_df = data_list, window_length = 36, 
                                      window_start = NULL, window_stop = NULL,
                                      include_partial_window = TRUE)
plot(windows, data_list, show_labels = TRUE)

this error: Can't subset columns that don't exist. x Column cases doesn't exist.

I've checked through many times based on my input data and the code previously and still can't understand why the name change occurs, if anyone is familiar with this package please assist thank you!

Upvotes: 0

Views: 213

Answers (1)

Nick Redell
Nick Redell

Reputation: 11

I'm the package author. It's difficult to tell without a reproducible example, but here's what I think is going on: Dynamic features are essentially features with a lag of 0. Dynamic features also retain their original names, as opposed to lagged features which have "_lag_n" appended to the feature name. So by setting dynamic_features to all column names you are getting duplicate columns specifically for the outcome column. My guess is that "cases" is the outcome here. Fix this by removing dynamic_features = colnames(data_train) and setting it to only those features that you really want to have a lag of 0.

Upvotes: 1

Related Questions