Canovice
Canovice

Reputation: 10173

R - figuring out what columns an xgboost model is expecting in new data for predictions

We have a .model file that has an xgboost model. Here's a snippet of our code loading the model:

> xg_model <- xgb.load("../model_outputs/our_saved_model.model")
> xg_model
##### xgb.Booster
raw: 1.6 Mb 
xgb.attributes:
  niter
niter: 149

I didn't create this model, but I am tasked with passing new data to the model in order to make predictions. Unfortunately, I am hitting this error:

Error in predict.xgb.Booster(xg_model, xgb.DMatrix(as.matrix(our_dataframe_of_data))) : 
  [01:34:01] amalgamation/../src/learner.cc:1183: Check failed: learner_model_param_.num_feature >= p_fmat->Info().num_col_ (38 vs. 40) : Number of columns does not match number of features in booster.

... so it's clear that our dataframe has 40 columns, but this model is trained to expect a dataframe with 38 columns. What's unclear is exactly which 38 columns our xg_model is expecting. Is there a function to call / plot to graph / etc. that might show what 38 columns the model was trained on? We only have the trained model currently, but not the R code that trained the model...

Upvotes: 1

Views: 1032

Answers (2)

prasanna sundar
prasanna sundar

Reputation: 33

I had the same issue, I was able to solve it after extracting the model features like this.

ModelVars<- xgb.importance(feature_names = colnames(our_dataframe_of_data),model=xg_model)

After this it was just a matter of subsetting my dataframe to the ones in ModelVars. I was able to use predict function and get the scores even though the number of features was less than the number of features in the training dataset as expected.

Upvotes: 1

user1808924
user1808924

Reputation: 4926

What's your XGBoost version? It's important to know, because XGBoost "schema specification" has been evolving quite significantly.

Right now, you should explore what attributes are available on your xgb.Booster object. See if it has nfeatures and feature_names attributes defined:

print(xg_model$nfeatures)
print(xg_model$feature_names)

I believe your xgb.Booster object has these attributes available, because how else would it know to demand 38 features?

Upvotes: 2

Related Questions