Reputation: 137
I'm having trouble understanding why all the values when calling the predict_proba
function in the xgboost library in python are in a quite close range of values, even though the model AUC in the test set is good enough for the problem at hand (0.78).
As you can see, the variance is low and the results are quite near around the 50% mark.
The test size is approximately a 15% of the available data (5000 observations).
I'm using the following parameters:
{'colsample_bytree': 0.5, 'gamma': 2, 'learning_rate': 0.01, 'max_depth': 8, 'min_child_weight': 10,
'n_estimators': 10, 'scale_pos_weight': 7}
Am I missing something here?
Upvotes: 1
Views: 946
Reputation: 4879
Without access to the data you are working with, it is impossible to say why exactly you are seeing what you are seeing.
That said, however -
predict_proba
will only give 4 values.Upvotes: 1