soulwreckedyouth
soulwreckedyouth

Reputation: 585

Get Sagemaker pipeline execution id within pipeline steps

Hello I thought my problem is simple but trying to google for the answer showed me something else: Within different Sagemaker Pipeline Steps (e.g ClarifyCheckStep) I want to get the Pipeline execution id so I can save the output of different steps in a nice manner and structure the saving of my output. Does anyone have an idea? Pipeline execution variables cannot be used in string format it seems: https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#execution-variables

Upvotes: 1

Views: 1338

Answers (2)

akshat garg
akshat garg

Reputation: 194

You are right execution variables cannot be used as strings but they can be used in using joins (in pipeline definitions of each step) e.g. Join(on='/', values=[s3_prefix, 'predictions', ExecutionVariables.PIPELINE_EXECUTION_ID])

s3_prefix is like 's3://a/b'. Above code will create a path whose last folder will be named as per pipeline execution id. So above path looks like 's3://a/b/predictions/<execution_id>'

You can pass pipeline parameters as well in Join.

Within pipeline steps you can pass pipeline execution id as environment variable or argument (whatever suits). Pipeline parameters can also be passed in same manner. "env" argument within processor  Using this you can organize you data while executing sagemaker pipelines

Upvotes: 1

Giuseppe La Gualano
Giuseppe La Gualano

Reputation: 1720

In order to save outputs following a certain structure, having in common the execution of the pipeline, the most robust method currently present is to use the code_location and output_path parameters of the various steps by previously creating a path that has the pipeline_name and possibly other details with a timestamp that guarantees its uniqueness.

Then, when you get your pipeline definition (e.g., with a get_pipeline() function), you can pass the pipeline_name and other variables. An example is as follows:

import time

pipeline = your_pipeline_script.get_pipeline(
    region = region,
    role = role,
    pipeline_name = your_pipeline_name,
    pipeline_detail = some_details + "-" + time.strftime("%Y%m%d%H%M%S", time.gmtime()),
    )

your output destination may become something like this:

outputs_destination = f"s3://{pipeline_session.default_bucket()}/pipeline/{pipeline_name}/{pipeline_detail}"

This way is your path is pregenerated before the pipeline is executed and is controllable with whatever parameter you want to enter.

One idea might be to create subfolders that have names of some particular parameter. The important thing is that it follows a well-defined and easily recognizable structure.

Upvotes: 0

Related Questions