Avinash Srivastav
Avinash Srivastav

Reputation: 11

Install additional packages in sagemaker pipeline

I want to install additional packages which will be used in processing step.

sklearn_processor = FrameworkProcessor(
    estimator_cls=SKLearn,
    framework_version='0.23-1',
    instance_type="ml.t3.medium",
    instance_count=1,
    base_job_name="sklearn-abalone-process",
    sagemaker_session=sagemaker_session,
    role=role
)

outputs = [
    ProcessingOutput(output_name="train", source="/opt/ml/processing/train"),
    ProcessingOutput(output_name="validation", source="/opt/ml/processing/validation"),
    ProcessingOutput(output_name="test", source="/opt/ml/processing/test")
]

step_process = ProcessingStep(
    name="Preprocess_Data",
    processor = sklearn_processor.run(outputs=outputs,
        code="pre-process.py", dependencies=["/home/sagemaker-user/dependencies/requirements.txt"])
)

After running ProcessingStep, I am getting ValueError: either step_args or processor need to be given, but not both.

Sagemaker version is 2.197.0

I tried going through AWS Documentation but no luck

Upvotes: 1

Views: 262

Answers (1)

MhFarahani
MhFarahani

Reputation: 970

try this:

step_args = sklearn_processor.run(outputs=outputs, code="pre-process.py", source_dir=BASE_DIR)

step_process = ProcessingStep(
    name="Preprocess_Data",
    step_args = step_args
)

Adding a note:

You do not need to specify requirements.txt as dependencies in the sklearn_processor.run(..). The FrameworkProcessor will upload it for you. Just set the source_dir in sklearn_processor.run(..). For more information check here: https://sagemaker-examples.readthedocs.io/en/latest/sagemaker_processing/scikit_learn_data_processing_and_model_evaluation/scikit_learn_data_processing_and_model_evaluation.html#Running-processing-jobs-with-FrameworkProcessor-to-include-custom-dependencies

and

https://sagemaker.readthedocs.io/en/stable/api/training/processing.html#sagemaker.processing.FrameworkProcessor.run

Upvotes: 0

Related Questions