Nick Dragosh
Nick Dragosh

Reputation: 515

S3 read Sagemaker trained model

Using Amazon Sagemaker, I created an Xgboost model. After unpacking the resulting tar.gz file, I end up with a file "xgboost-model".

The next step will be to upload the model directly from my S3 bucket, without downloading it using pickle. Here is what I tried:

obj = client.get_object(Bucket='...',Key='xgboost-model')

xgb_model = pkl.load(open((obj['Body'].read())),"rb")

But it throws me the error:

TypeError: embedded NUL character

Also tried this:

xgb_model = pkl.loads(open((obj['Body'].read())),"rb")

the outcome was the same.

Another approach:

bucket='...'
key='xgboost-model'

with s3io.open('s3://{0}/{1}'.format(bucket, key),mode='w') as s3_file:
  pkl.dump(mdl, s3_file)

This giving the error:

CertificateError: hostname bucket doesn't match either of '*.s3.amazonaws.com', 's3.amazonaws.com'

This although the bucket is the same.

How Can I upload the model in a pickle object so I can then use it it for predictions?

Upvotes: 1

Views: 3353

Answers (2)

Diogo
Diogo

Reputation: 11

If you have trained the model using SageMaker's XGBoost built-in algorithm at one point and would like to use that model to do predictions in a Sagemaker environment at a later stage you use the estimator's 'attach' method.

Right after you fitted XGBoost you can use

model_job_name = xgb_model._current_job_name

to determine the training job's name. Alternatively you can go to the 'training job' section of the SageMaker dashboard and find the name of the job that you ran: training job dashboard

Later when you want to reuse the model you do:

import sagemaker
reloaded_xgb_model = sagemaker.estimator.Estimator.attach(model_job_name)

Upvotes: 0

raj
raj

Reputation: 1213

My assumption is you have trained the model using Sagemaker XGBoost built-in algorithm. You would like to use that model and do the predictions in your own hosting environment (not Sagemaker hosting).

pickle.load(file) reads a pickled object from the open file object file and pickle.loads(bytes_object) reads a pickled object from a bytes object and returns the deserialized object. Since you have the S3 object already downloaded (into memory) as bytes, you can use pickle.loads without using open

xgb_model = pkl.loads(obj['Body'].read())

Upvotes: 1

Related Questions