How to save Tensorflow model in S3 (as /output/model.tar.gz) when using Tensorflow Estimator in AWS Sagemaker

Question

I have a Keras model getting trained using an entry_point script and I am using the following pieces of code to store the model artifacts (in the entry_point script).

parser.add_argument('--model_dir', type=str, default=os.environ['SM_MODEL_DIR'])
args, _ = parser.parse_known_args()
model_dir  = args.model_dir
---

tf.keras.models.save_model(
      model,
      os.path.join(model_dir, 'model/1'),
      overwrite=True,
      include_optimizer=True
     )

Ideally, the model_dir should be opt/ml/model and Sagemaker should automatically move the contents of this folder to S3 as s3:////output/model.tar.gz

When I run the estimator.fit({'training': training_input_path}), the training is successful, but the Cloudwatch logs show the following:

2020-09-16 02:49:12,458 sagemaker_tensorflow_container.training WARNING  No model artifact is saved under the path /opt/ml/model. Your training job will not save any model files to S3.

Even then, Sagemaker does store my model artifacts, with the only difference being that instead of storing them in s3:////output/model.tar.gz, they are now stored unzipped as s3:////model/model/1/saved_model.pb along with the variables and assets folder. Because of this, estimator.deploy() call fails as it is unable to find the artifacts in the output/ directory.

Sagemaker Python SDK - 2.6.0

Estimator code:

from sagemaker.tensorflow import TensorFlow

tf_estimator = TensorFlow(entry_point='autoencoder-model.py',
                       role=role,
                       instance_count=1,
                       instance_type='ml.m5.large',
                       framework_version="2.3.0",
                       py_version="py37",
                       debugger_hook_config=False,
                       hyperparameters={'epochs': 20},
                       source_dir='/home/ec2-user/SageMaker/model',
                       subnets=['subnet-1', 'subnet-2'],
                       security_group_ids=['sg-1', 'sg-1'])

What could I be doing wrong here?

How to save Tensorflow model in S3 (as /output/model.tar.gz) when using Tensorflow Estimator in AWS Sagemaker

Answers (1)

Related Questions