Reputation: 915
Is it possible to train a model by xgboost that has multiple continuous outputs (multi-regression)? What would be the objective of training such a model?
Thanks in advance for any suggestions
Upvotes: 49
Views: 44761
Reputation: 1833
The 2.0.0 xgboost release supports multi-target trees with vector-leaf outputs.
Meaning, xgboost can now build multi-output trees where the size of leaf equals the number of targets. The tree method hist
must be used.
Specify the multi_strategy = "multi_output_tree"
training parameter to build a multi-output tree:
clf = xgb.XGBClassifier(tree_method="hist", multi_strategy="multi_output_tree")
A regression example based on @ComeAndGetMe's answer:
import numpy as np
import pandas as pd
import xgboost as xgb
print('xgb version:', xgb.__version__)
# get some noised linear data
X = np.random.random((1000, 10))
a = np.random.random((10, 3))
y = np.dot(X, a) + np.random.normal(0, 1e-3, (1000, 3))
# fitting
multioutputregressor = xgb.XGBRegressor(objective = "reg:squarederror",
tree_method = "hist",
multi_strategy = "multi_output_tree")
multioutputregressor.fit(X, y)
# predicting on the training data
print('mse:', np.mean((multioutputregressor.predict(X) - y)**2, axis=0))
Output:
xgb version: 2.0.0
mse: [9.43447858e-05 8.78643942e-05 9.99183540e-05]
Setting multi_strategy="one_output_per_tree"
, the default, will build one model per-target. In general, I would expect this option to give better results.
See also the xgboost tutorial for multi-output regression.
As of this writing, xgboost support for multiple outputs is considered experimental and only the python version is tested.
Upvotes: 3
Reputation: 1
You can use Linear regression, random forest regressors, and some other related algorithms in scikit-learn to produce multi-output regression. Not sure about XGboost. The boosting regressor in Scikit does not allow multiple outputs. For people who asked, when it may be necessary one example would be to forecast multi-steps of time-series a head.
Upvotes: 0
Reputation: 10011
It generates warnings:
reg:linear is now deprecated in favor of reg:squarederror
, so I updated an answer based on @ComeOnGetMe's
import numpy as np
import pandas as pd
import xgboost as xgb
from sklearn.multioutput import MultiOutputRegressor
# get some noised linear data
X = np.random.random((1000, 10))
a = np.random.random((10, 3))
y = np.dot(X, a) + np.random.normal(0, 1e-3, (1000, 3))
# fitting
multioutputregressor = MultiOutputRegressor(xgb.XGBRegressor(objective='reg:squarederror')).fit(X, y)
# predicting
print(np.mean((multioutputregressor.predict(X) - y)**2, axis=0))
Out:
[2.00592697e-05 1.50084441e-05 2.01412247e-05]
Upvotes: 6
Reputation: 1117
My suggestion is to use sklearn.multioutput.MultiOutputRegressor as a wrapper of xgb.XGBRegressor
. MultiOutputRegressor
trains one regressor per target and only requires that the regressor implements fit
and predict
, which xgboost happens to support.
# get some noised linear data
X = np.random.random((1000, 10))
a = np.random.random((10, 3))
y = np.dot(X, a) + np.random.normal(0, 1e-3, (1000, 3))
# fitting
multioutputregressor = MultiOutputRegressor(xgb.XGBRegressor(objective='reg:linear')).fit(X, y)
# predicting
print(np.mean((multioutputregressor.predict(X) - y)**2, axis=0)) # 0.004, 0.003, 0.005
This is probably the easiest way to regress multi-dimension targets using xgboost as you would not need to change any other part of your code (if you were using the sklearn API originally).
However, this method does not leverage any possible relation between targets. But you can try to design a customized objective function to achieve that.
Upvotes: 67
Reputation: 21
Based on the above discussion, I have extended the univariate XGBoostLSS to a multivariate framework called Multi-Target XGBoostLSS Regression that models multiple targets and their dependencies in a probabilistic regression setting. Code follows soon.
Upvotes: 1
Reputation: 69
I would place a comment but I lack the reputation. In addition to @Jesse Anderson, to install the most recent version, select the top link from here: https://s3-us-west-2.amazonaws.com/xgboost-nightly-builds/list.html?prefix=master/
Make sure to select the one for your operating system.
Use pip install to install the wheel. I.e. for macOS:
pip install https://s3-us-west-2.amazonaws.com/xgboost-nightly-builds/master/xgboost-1.6.0.dev0%2B4d81c741e91c7660648f02d77b61ede33cef8c8d-py3-none-macosx_10_15_x86_64.macosx_11_0_x86_64.macosx_12_0_x86_64.whl
Upvotes: 3
Reputation: 4603
Multiple output regression is now available in the nightly build of XGBoost, and will be included in XGBoost 1.6.0.
See https://github.com/dmlc/xgboost/blob/master/demo/guide-python/multioutput_regression.py for an example.
Upvotes: 12