Reputation: 1846
Im a using a SageMaker training job to train an ML model, and I am attempting to output the model to a specific location on S3.
Code:
model_uri = "s3://***/model/"
script_path = 'entry_point.py'
sklearn = SKLearn(
entry_point=script_path,
train_instance_type="ml.m5.large",
output_path=model_uri,
role='***',
sagemaker_session=sagemaker_session)
The issue I am having is that the training job will save the model twice. Once in the S3 bucket at the top level, and once in the folder specified (/model
).
Is this expected behaviour when specifying output_path
in the estimator? Is there a way to stop it?
Any help would be appreciated!
Upvotes: 0
Views: 2060
Reputation: 31
If you look in the top level folder, it will actually contain other information that the job create, whereas the job folder in your Model folder will actually contain the .joblib model (as a tar.gz file) from your process.
Use the code_location
parameter when creating the SKLearn
object. For example:
model_uri = "s3://***/model/"
training_output_uri = "s3://***/training-output"
script_path = 'entry_point.py'
sklearn = SKLearn(
entry_point=script_path,
train_instance_type="ml.m5.large",
output_path=model_uri,
code_location=training_output_uri,
role='***',
sagemaker_session=sagemaker_session)
where the "training-output" folder is created in the S3 bucket.
Reference: The code_location
parameter comes from the Framework parent class, which the SKLearn class is based on.
Upvotes: 2