Reputation: 2603

xgboost linear regression (gblinear) wrong predictions

I am using the python xgboost library, and I am unable to get a simple working example using the gblinear booster:

M = np.array([
    [1, 2],
    [2, 4],
    [3, 6],
    [4, 8],
    [5, 10],
    [6, 12],
])

xg_reg = xgb.XGBRegressor(objective ='reg:linear', booster='gblinear')

X, y = M[:, :-1], M[:, -1]
xg_reg.fit(X,y)

plt.scatter(range(-5, 20), [xg_reg.predict([i]) for i in range(-5, 20)])
plt.scatter(M[:,0], M[:,-1])
plt.show()

Predictions are in blue, and real data in orange

Am I missing something?

Upvotes: 1

Answers (1)

Mischa Lisovyi

Reputation: 3223

I think the issue is that the model does not converge to the optimum with the configuration and the amount of data that you have chosen. GBM's do not use the boosting model to fit the target directly, but rather to fit the gradient and then to add a fraction of the prediction (fraction is equal to the learning rate) to the prediction from the previous step.

So the obvious ways to improve are: increase the learning rate, increase the number of iterations, increase the data size.

For example, this variant of your code gives already a better prediction:

X = np.expand_dims(range(1,7), axis=1)
y = 2*X

# note increased learning rate!
xg_reg = xgb.XGBRegressor(objective ='reg:linear', booster='gblinear', learning_rate=1)
xg_reg.fit(X, y, verbose=20, eval_set=[(X,y)])

plt.scatter(range(-5, 20), [xg_reg.predict([i]) for i in range(-5, 20)], label='prediction')
plt.scatter(X[:20,:], y[:20], label='target')
plt.legend()
plt.show()

This leads to the metric value of 0.872 on the training data (i've added evaluation in the fit function to see how does it change). This is further reduced to ~0.1, if you increase the number of samples from 7 to 70.

Upvotes: 1

xgboost linear regression (gblinear) wrong predictions

Answers (1)

Related Questions