Gena Kukartsev
Gena Kukartsev

Reputation: 1715

AWS SageMaker SKLearn entry point in a subdirectory?

Can I specify SageMaker estimator's entry point script to be in a subdirectory? So far, it fails for me. Here is what I want to do:

sklearn = SKLearn(
    entry_point="RandomForest/my_script.py",
    source_dir="../",
    hyperparameters={...

I want to do this so I don't have to break my directory structure. I have some modules, which I use in several sagemaker projects, and each project lives in its own directory:

my_git_repo/

  RandomForest/
    my_script.py
    my_sagemaker_notebook.ipynb

  TensorFlow/
    my_script.py
    my_other_sagemaker_notebook.ipynb

module_imported_in_both_scripts.py

If I try to run this, SageMaker fails because it seems to parse the name of the entry point script to make a module name out of it, and it does not do a good job:

/usr/bin/python3 -m RandomForest/my_script --bootstrap True --case nf_2 --max_features 0.5 --min_impurity_decrease 5.323785009485933e-06 --model_name model --n_estimators 455 --oob_score True

...

/usr/bin/python3: No module named RandomForest/my_script

Anyone knows a way around this other than putting my_script.py in the source_dir?

Related to this question

Upvotes: 0

Views: 1371

Answers (2)

Olivier Cruchant
Olivier Cruchant

Reputation: 4037

What if you do source_dir = my_git_repo/RandomForest ? Otherwise, you can also use a build functionality (such as CodeBuild - but it could also be some custom code eg in Lambda or Airflow) to send your script as a compressed artifact to s3, as this is how lower level SDKs such as boto3 expect your script anyway; this type of integration is shown in the boto3 section of the SageMaker Sklearn random forest demo

Upvotes: 0

lauren
lauren

Reputation: 513

Unfortunately, this is a gap in functionality. There is some related work in https://github.com/aws/sagemaker-python-sdk/pull/941 which should also solve this issue, but for now, you do need to put my_script.py in source_dir.

Upvotes: 1

Related Questions