Reputation: 325
I am working on a time series regression problem, predict future 5 days stock price, I think this is multiple continuous outputs (multi regression).
Hence, I use MultiOutputRegressor
in sklearn to do what I want.
But the predicted value for the first three days of the future is more important than the fourth and fifth days of the future.
Therefore, I want to use a weight to penalize the first three days, anyone knows how to fix this problem with customer objective function? or have other methods to fix this problem
Data and code is below
{"date": {"0": "2003-06-30", "1": "2003-07-01", "2": "2003-07-02", "3": "2003-07-03", "4": "2003-07-04", "5": "2003-07-07", "6": "2003-07-08", "7": "2003-07-09", "8": "2003-07-10", "9": "2003-07-11"}, "open": {"0": 37.1, "1": 37.09, "2": 38.17, "3": 40.6, "4": 39.1, "5": 39.6, "6": 42.0, "7": 41.3, "8": 41.2, "9": 39.6}, "max": {"0": 37.4, "1": 38.1, "2": 38.82, "3": 40.6, "4": 39.26, "5": 41.0, "6": 42.0, "7": 41.3, "8": 41.2, "9": 39.97}, "min": {"0": 36.92, "1": 37.09, "2": 38.1, "3": 38.81, "4": 38.75, "5": 39.6, "6": 40.7, "7": 40.81, "8": 40.05, "9": 39.3}, "close": {"0": 37.08, "1": 38.05, "2": 38.69, "3": 39.0, "4": 39.26, "5": 41.0, "6": 41.19, "7": 41.22, "8": 40.05, "9": 39.91}, "stock_id": {"0": 50, "1": 50, "2": 50, "3": 50, "4": 50, "5": 50, "6": 50, "7": 50, "8": 50, "9": 50}, "target_next_1_day": {"0": 38.05, "1": 38.69, "2": 39.0, "3": 39.26, "4": 41.0, "5": 41.19, "6": 41.22, "7": 40.05, "8": 39.91, "9": 40.66}, "target_next_2_day": {"0": 38.69, "1": 39.0, "2": 39.26, "3": 41.0, "4": 41.19, "5": 41.22, "6": 40.05, "7": 39.91, "8": 40.66, "9": 40.19}, "target_next_3_day": {"0": 39.0, "1": 39.26, "2": 41.0, "3": 41.19, "4": 41.22, "5": 40.05, "6": 39.91, "7": 40.66, "8": 40.19, "9": 40.85}, "target_next_4_day": {"0": 39.26, "1": 41.0, "2": 41.19, "3": 41.22, "4": 40.05, "5": 39.91, "6": 40.66, "7": 40.19, "8": 40.85, "9": 39.8}, "target_next_5_day": {"0": 41.0, "1": 41.19, "2": 41.22, "3": 40.05, "4": 39.91, "5": 40.66, "6": 40.19, "7": 40.85, "8": 39.8, "9": 39.92}}
data = pd.DataFrame(data)
X = data[['open', 'max', 'min', 'close']]
Y = data[['target_next_1_day', 'target_next_2_day', 'target_next_3_day', 'target_next_4_day', 'target_next_5_day']]
def customer_obj(target, predict):
gradient = ????
hess = ????
return gradient, hess
clfpre = xgb.XGBRegressor(n_estimators=5, objective =customer_obj)
clf = MultiOutputRegressor(clfpre).fit(X.values, Y.values)
Upvotes: 1
Views: 1419
Reputation: 16966
MultiOutputRegressor
just builds independent model for each target variable. Hence creating a customized objective function cannot be useful for MultiOutputRegressor
.
Your customer_obj
can only account for a individual target variable. For example, you can change the default objective reg:squarederror
to something like Mean Absolute Error.
From Documentation:
Multi target regression
This strategy consists of fitting one regressor per target. This is a simple strategy for extending regressors that do not natively support multi-target regression.
Upvotes: 1