Reputation: 936
I am trying to create Sagemaker pipeline with multiple steps. I have some code which I would like to share across different steps. Next example is not exact but simplified version for illustration.
I have folder structure which looks next:
source_scripts/
├── utils
│ ├── logger.py
├── models/
│ ├── ground_truth.py
│ ├── document.py
├── processing/
│ ├── processing.py
│ └── main.py
└── training/
├── training.py
└── main.py
I would like to use code from models
and utils
inside training.py
as I don't know where exactly code mounted on Sagemaker instance I am using:
from ..common.ground_truth import GroundTruthRow
As I am building pipeline I create processing and training step:
script_processor = FrameworkProcessor()
args = script_processor.get_run_args(
source_dir="source_scripts"
code="processing/main.py"
)
step_process = ProcessingStep(
code=args.code
)
estimator = Estimator(
source_dir="source_scripts"
code="training/main.py"
)
step_train = TrainingStep(
estimator=estimator
)
But during pipeline execution it results in error:
ImportError: attempted relative import with no known parent package
Any suggestions how to share code across several SageMaker jobs in single pipeline without building custom docker image?
Upvotes: 0
Views: 106
Reputation: 267
Quickest solution for this issue is to move the entry point scripts (processing/main.py, training/main.py) to the upper level directly under "source_scripts/" like this:
source_scripts/
├── utils
│ └── logger.py
├── models/
│ ├── ground_truth.py
│ └── document.py
├── processing_main.py
└── training_main.py
and avoid using relative import with ..
in the entry point scripts.
from models.ground_truth import GroundTruthRow
The reason why you see the ImportError is because the entry point script's __name__
variable doesn't have package structure information unlike other modules.
If you print __name__
in models.ground_truth.py
, you would see something like models.ground_truth
and it contains package structure. But if you print __name__
in training/main.py
, you see __main__
, so Python cannot understand how to handle ..
.
If you need to keep the current directory structure, there may be more complicated solutions, but moving entry point files to upper level would be simplest.
Also, I recommend testing the solution in your handy local Python environment before running your SageMaker Pipeline.
Upvotes: 1