Reputation: 173
I am trying to run custom python/sklearn sagemaker script on AWS, basically learning from these examples: https://github.com/aws/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/scikit_learn_randomforest/Sklearn_on_SageMaker_end2end.ipynb
All works fine, if define the arguments, train the model and output the file:
parser.add_argument('--model-dir', type=str, default=os.environ.get('SM_MODEL_DIR'))
parser.add_argument('--train', type=str, default=os.environ.get('SM_CHANNEL_TRAIN'))
parser.add_argument('--test', type=str, default=os.environ.get('SM_CHANNEL_TEST'))
# train the model...
joblib.dump(model, os.path.join(args.model_dir, "model.joblib"))
And call the job with:
aws_sklearn.fit({'train': 's3://path/to/train', 'test': 's3://path/to/test'}, wait=False)
In this case model gets stored on different auto-generated bucket, which I do not want. I want to get the output (.joblib file) in the same s3 bucket I took data from. So I add the parameter model-dir
:
aws_sklearn.fit({'train': 's3://path/to/train', 'test': 's3://path/to/test', `model-dir`: 's3://path/to/model'}, wait=False)
But it results in error:
FileNotFoundError: [Errno 2] No such file or directory: 's3://path/to/model/model.joblib'
Same happens if I hardcode the output path inside the training script.
So the main question, how can I get the output file in the bucket of my choice?
Upvotes: 2
Views: 2520
Reputation: 2765
You can use parameter output_path
when you define the estimator. If you use the
model_dir
I guess you have to create that bucket beforehand, but you have the advantage that artifacts can be saved in real time during the training (if the instance has rights on S3). You can take a look at my repo for this specific case.
Upvotes: 2