Ismalyt
Ismalyt

Reputation: 59

XGBoost predictions not working on AI Platform: 'features names mismatch'

I have deployed an XGBoost model on GCP's AI Platform (ex-ML Engine) to make predictions (it is stored on GCS as a joblib file). But, when I try to make predictions on a list of features, I get a 'features mismatch' error.

AI Platform requires a specific format for input data :

An instances list

Also, when I test predictions on JupyterLab, the .predict method of my classifier works when I give it a DataFrame, but does not if I try to make predictions on arrays or single rows of the DataFrame.

The error message I obtain (both on AI Platform and JupyterLab) is

{
  "error": "Prediction failed: Exception during sklearn prediction:
 feature_names mismatch: [THE FEATURES LIST] ['f0', 'f1', 'f2', 'f3', 'f4', 'f5', 'f6', 'f7', 'f8', 'f9', 'f10', 'f11', 'f12', 'f13', 'f14', 'f15', 'f16', 'f17', 'f18', 'f19', 'f20', 'f21', 'f22', 'f23', 'f24', 'f25', 'f26', 'f27', 'f28', 'f29', 'f30', 'f31', 'f32', 'f33', 'f34', 'f35', 'f36', 'f37', 'f38', 'f39', 'f40', 'f41', 'f42', 'f43', 'f44', 'f45', 'f46', 'f47', 'f48', 'f49', 'f50', 'f51', 'f52', 'f53', 'f54', 'f55', 'f56']
\nexpected [THE FEATURES LIST BUT NOT IN THE SAME ORDER] in input data
\ntraining data did not have the following fields: f23, f14, f41, f6, f19, f35, f5, f49, f50, f18, f25, f45, f36, f21, f42, f0, f2, f37, f44, f47, f16, f22, f1, f3, f8, f53, f33, f11, f38, f48, f12, f31, f39, f27, f40, f52, f26, f29, f43, f20, f4, f10, f7, f13, f28, f9, f56, f24, f17, f32, f34, f54, f51, f15, f30, f46, f55"
}

Maybe the input I give is not what is expected. But it seems there is not other input option on AI Platform. What I look for is a solution specifically for Google Cloud's AI Platform.

Upvotes: 1

Views: 758

Answers (2)

yxteh
yxteh

Reputation: 156

TLDR: Just before saving to the bucket do this: model.feature_names = None

I ran into this issue too. Like you, I solved it by uploading a .bst file into the bucket. I wanted to investigate a little more and here are my findings.

Suppose the model is saved via joblib joblib.dump(xgbm, 'model.joblib').

Then, loading the model, model = joblib.load("model.joblib") and calling model.feature_names, will give you the list of feature names, implying the list of features is stored somewhere within the model.joblib file.

However, the request to AI Platform does not include the list of column names(AFAIK there is no way to include it unless you are doing a custom prediction routine) so it throws the feature_names mismatch error when google tries to do something along the lines of model.predict(xgb.DMatrix(np.asarray(instances))).

The same is true if the model is saved using pickle.

If the model is saved as a .bst, then calling model.feature_names will be empty and everything works since neither the model nor the prediction instances has information on the list of features.

Upvotes: 1

Ismalyt
Ismalyt

Reputation: 59

I solved the problem by setting the version's framework to XGBoost (it was previously sklearn) and I uploaded a .bst file in the bucket instead of a .joblib

Thanks for your help @user260826 :)

Upvotes: 1

Related Questions