CYC
CYC

Reputation: 325

How to use objective function with MultiOutputRegressor in regression problem

I am working on a time series regression problem, predict future 5 days stock price, I think this is multiple continuous outputs (multi regression).

Hence, I use MultiOutputRegressor in sklearn to do what I want.

But the predicted value for the first three days of the future is more important than the fourth and fifth days of the future.

Therefore, I want to use a weight to penalize the first three days, anyone knows how to fix this problem with customer objective function? or have other methods to fix this problem

Data and code is below

{"date": {"0": "2003-06-30", "1": "2003-07-01", "2": "2003-07-02", "3": "2003-07-03", "4": "2003-07-04", "5": "2003-07-07", "6": "2003-07-08", "7": "2003-07-09", "8": "2003-07-10", "9": "2003-07-11"}, "open": {"0": 37.1, "1": 37.09, "2": 38.17, "3": 40.6, "4": 39.1, "5": 39.6, "6": 42.0, "7": 41.3, "8": 41.2, "9": 39.6}, "max": {"0": 37.4, "1": 38.1, "2": 38.82, "3": 40.6, "4": 39.26, "5": 41.0, "6": 42.0, "7": 41.3, "8": 41.2, "9": 39.97}, "min": {"0": 36.92, "1": 37.09, "2": 38.1, "3": 38.81, "4": 38.75, "5": 39.6, "6": 40.7, "7": 40.81, "8": 40.05, "9": 39.3}, "close": {"0": 37.08, "1": 38.05, "2": 38.69, "3": 39.0, "4": 39.26, "5": 41.0, "6": 41.19, "7": 41.22, "8": 40.05, "9": 39.91}, "stock_id": {"0": 50, "1": 50, "2": 50, "3": 50, "4": 50, "5": 50, "6": 50, "7": 50, "8": 50, "9": 50}, "target_next_1_day": {"0": 38.05, "1": 38.69, "2": 39.0, "3": 39.26, "4": 41.0, "5": 41.19, "6": 41.22, "7": 40.05, "8": 39.91, "9": 40.66}, "target_next_2_day": {"0": 38.69, "1": 39.0, "2": 39.26, "3": 41.0, "4": 41.19, "5": 41.22, "6": 40.05, "7": 39.91, "8": 40.66, "9": 40.19}, "target_next_3_day": {"0": 39.0, "1": 39.26, "2": 41.0, "3": 41.19, "4": 41.22, "5": 40.05, "6": 39.91, "7": 40.66, "8": 40.19, "9": 40.85}, "target_next_4_day": {"0": 39.26, "1": 41.0, "2": 41.19, "3": 41.22, "4": 40.05, "5": 39.91, "6": 40.66, "7": 40.19, "8": 40.85, "9": 39.8}, "target_next_5_day": {"0": 41.0, "1": 41.19, "2": 41.22, "3": 40.05, "4": 39.91, "5": 40.66, "6": 40.19, "7": 40.85, "8": 39.8, "9": 39.92}}

data = pd.DataFrame(data)
X = data[['open', 'max', 'min', 'close']]
Y = data[['target_next_1_day', 'target_next_2_day', 'target_next_3_day', 'target_next_4_day', 'target_next_5_day']]

def customer_obj(target, predict):
    gradient = ????
    hess = ????
    return gradient, hess
clfpre = xgb.XGBRegressor(n_estimators=5, objective =customer_obj)
clf = MultiOutputRegressor(clfpre).fit(X.values, Y.values)

Upvotes: 1

Views: 1419

Answers (1)

Venkatachalam
Venkatachalam

Reputation: 16966

MultiOutputRegressor just builds independent model for each target variable. Hence creating a customized objective function cannot be useful for MultiOutputRegressor.

Your customer_obj can only account for a individual target variable. For example, you can change the default objective reg:squarederror to something like Mean Absolute Error.

From Documentation:

Multi target regression

This strategy consists of fitting one regressor per target. This is a simple strategy for extending regressors that do not natively support multi-target regression.

Upvotes: 1

Related Questions