Reputation: 1715
Can I specify SageMaker estimator's entry point script to be in a subdirectory? So far, it fails for me. Here is what I want to do:
sklearn = SKLearn(
entry_point="RandomForest/my_script.py",
source_dir="../",
hyperparameters={...
I want to do this so I don't have to break my directory structure. I have some modules, which I use in several sagemaker projects, and each project lives in its own directory:
my_git_repo/
RandomForest/
my_script.py
my_sagemaker_notebook.ipynb
TensorFlow/
my_script.py
my_other_sagemaker_notebook.ipynb
module_imported_in_both_scripts.py
If I try to run this, SageMaker fails because it seems to parse the name of the entry point script to make a module name out of it, and it does not do a good job:
/usr/bin/python3 -m RandomForest/my_script --bootstrap True --case nf_2 --max_features 0.5 --min_impurity_decrease 5.323785009485933e-06 --model_name model --n_estimators 455 --oob_score True
...
/usr/bin/python3: No module named RandomForest/my_script
Anyone knows a way around this other than putting my_script.py
in the source_dir
?
Upvotes: 0
Views: 1371
Reputation: 4037
What if you do source_dir = my_git_repo/RandomForest
?
Otherwise, you can also use a build functionality (such as CodeBuild - but it could also be some custom code eg in Lambda or Airflow) to send your script as a compressed artifact to s3, as this is how lower level SDKs such as boto3 expect your script anyway; this type of integration is shown in the boto3 section of the SageMaker Sklearn random forest demo
Upvotes: 0
Reputation: 513
Unfortunately, this is a gap in functionality. There is some related work in https://github.com/aws/sagemaker-python-sdk/pull/941 which should also solve this issue, but for now, you do need to put my_script.py
in source_dir
.
Upvotes: 1