Reputation: 19
I developed a model in Keras that relates an input matrix x (168x326) to an output vector y (168x1). Input X is a week i containing 326 features generated hourly for 168 hours. Output y is week i+1 containing 168 hourly prices. The training set contains 208 pairs of weeks (x_train->y_train), while the test set contains 51 pairs (x_test->y_test). Shapes are 3D tensors and are formatted as follows:
print(x_train.shape)
print(y_train.shape)
print(x_test.shape)
print(y_test.shape)
*Output:
x_train: (208, 168, 326)
y_train: (208, 168, 1)
x_test: (51, 168, 326)
y_test: (51, 168, 1)*
I want to use these exact same datasets to perform price prediction using XGBoost. My model is built like this:
reg = xgb.XGBRegressor(n_estimators=1000)
reg.fit(x_train, y_train,
eval_set=[(x_train, y_train), (x_test, y_test)],
early_stopping_rounds=50,
verbose=True)
However, when running, I get an error message saying that XGBoost expects 2D vectors. The one that follows:
ValueError: Please reshape the input data into 2-dimensional matrix.
I've done some tests removing or reshaping dimensions in the datasets, but I haven't succeeded. Could someone tell me how to perform this conversion on the data? Thanks.
Upvotes: 0
Views: 498
Reputation: 19
First I needed to flatten the last two dimensions to create just one. My tensors now have the following shapes: x_train: (208, 54768), y_train: (208, 168), x_test: (51, 54768) and y_test: (51, 168). Thus reducing the tensor from 3D to 2D. Next, I discovered that these regressors do not work by default with multi-valued outputs. To do this it is necessary to import the MultiOutputRegressor.
from sklearn.multioutput import MultiOutputRegressor
Then you need to include the regressor inside this wrapper, like this:
reg = MultiOutputRegressor(XGBRegressor())
I tested it for XGB and LGBM and it worked great. However, if you're using CatBoost, better format your data to use the CatBoost library's own Pool. Here:
from catboost import Pool
The code looks like this:
dtrain = Pool(x_train, label=y_train)
params = {'iterations': 500, 'learning_rate': 0.1, 'depth': 3, 'loss_function': 'MultiRMSE'}
CAT_reg = CatBoostRegressor(**params)
CAT_reg.fit(dtrain)
Upvotes: 0