Gaurav Bansal
Gaurav Bansal

Reputation: 5660

Unable to open pickled Sagemaker XGBoost model

I'm trying to open a pickled XGBoost model I created in AWS Sagemaker to look at feature importances in the model. I'm trying to follow the answers in this post. However, I get an the error shown below. When I try to call Booster.save_model, I get an error saying 'Estimator' object has no attribute 'save_model'. How can I resolve this?

# Build initial model
sess = sagemaker.Session()
s3_input_train = sagemaker.s3_input(s3_data='s3://{}/{}/train/'.format(bucket, prefix), content_type='csv')
xgb_cont = get_image_uri(region, 'xgboost', repo_version='0.90-1')
xgb = sagemaker.estimator.Estimator(xgb_cont, role, train_instance_count=1, train_instance_type='ml.m4.4xlarge',
                                    output_path='s3://{}/{}'.format(bucket, prefix), sagemaker_session=sess)
xgb.set_hyperparameters(eval_metric='rmse', objective='reg:squarederror', num_round=100)
ts = strftime("%Y-%m-%d-%H-%M-%S", gmtime())
xgb_name = 'xgb-initial-' + ts
xgb.set_hyperparameters(eta=0.1, alpha=0.5, max_depth=10)
xgb.fit({'train': s3_input_train}, job_name=xgb_name)

# Load model to get feature importances
model_path = 's3://{}/{}//output/model.tar.gz'.format(bucket, prefix, xgb_name)
fs = s3fs.S3FileSystem()
with fs.open(model_path, 'rb') as f:
    with tarfile.open(fileobj=f, mode='r') as tar_f:
        with tar_f.extractfile('xgboost-model') as extracted_f:
            model = pickle.load(extracted_f)

XGBoostError: [19:16:42] /workspace/src/learner.cc:682: Check failed: header == serialisation_header_: 

  If you are loading a serialized model (like pickle in Python) generated by older
  XGBoost, please export the model by calling `Booster.save_model` from that version
  first, then load it back in current version.  There's a simple script for helping
  the process. See:

    https://xgboost.readthedocs.io/en/latest/tutorials/saving_model.html

  for reference to the script, and more details about differences between saving model and
  serializing.

Upvotes: 5

Views: 5796

Answers (2)

Hassan Id Mansour
Hassan Id Mansour

Reputation: 1

If you absolutely need to use Pickle to load an XGBoost model, and you are encountering compatibility issues with the latest version of XGBoost, downgrading to a specific version may be a viable solution:

pip uninstall xgboost
pip install xgboost==0.90

Then the following would work properly

import pickle
with open("model.dat", "rb") as file:
     loaded_model = pickle.load(file)

Upvotes: 0

Julien Simon
Julien Simon

Reputation: 2729

Which version of XGBoost are you using in the notebook? The model format has changed in XGBoost 1.0. See https://xgboost.readthedocs.io/en/latest/tutorials/saving_model.html. Short version: if you're using 1.0 in the notebook, you can't load a pickled model.

Here's a working example using XGBoost in script mode (which is much more flexible than the built in algo):

Upvotes: 2

Related Questions