Reputation: 21
I have variable, and I need to predict its value as close as possible, but not greater than it. For example, given y_true = 9000, I want y_pred to be any value within range [0,9000] as close to 9000 as possible. And if y_true = 8000 respectively y_pred should be [0,8000]. That is, I want to make some kind of restriction on the predicted value. That threshold is individual for each pair of prediction and target variable from the sample. if y_true = [8750,9200,8900,7600] that y_pred should be [<=8750,<=9200,<=8900,<=7600]. The only task is to predict exactly no more and get closer. everywhere zero is considered the correct answer, but I just need to get as close as possible
data, target = np.array(data),np.array(df_tar)
X_train,X_test,y_train,y_test=train_test_split(data,target)
gbr = GradientBoostingRegressor(max_depth=1,n_estimators=100)
%time gbr.fit(X_train,np.ravel(y_train))
print(gbr.score(X_test,y_test),gbr.score(X_train,y_train))
Upvotes: 2
Views: 305
Reputation: 18377
Due to the complexity of actually changing and coming up with a model that can take this approach you desire into sklearn's function and apply it, I strongly suggest you pass this filter after the prediction, and replace all predicted values over 9000 to 9000. And afterwards, manually compute the score, which I believe is mse
in this scenario.
Here is a full workinge example of my approach:
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import mean_squared_error as mse
import numpy as np
X = [[8500,9500],[9200,8700],[8500,8250],[5850,8800]]
y = [8750,9200,8900,7600]
data, target = np.array(X),np.array(y)
gbr = GradientBoostingRegressor(max_depth=1,n_estimators=100)
gbr.fit(data,np.ravel(target))
predictions = gbr.predict(data)
print(predictions) ## The original predicitions
Output:
[8750.14958301 9199.23464805 8899.87846735 7600.73730159]
Perform the replacement:
fixed_predictions = np.array([z if y>z else y for y,z in zip(target,predictions)])
print(fixed_predictions)
[8750. 9199.23464805 8899.87846735 7600. ]
Compute the new score:
score = mse(target,predictions)
print(score)
Output:
10000.145189724533
Upvotes: 2