hahilas
hahilas

Reputation: 43

How do I pass in the arguments to my Python file when running the Sagemaker pipeline's ProcessingStep?

I read from this documentation that the ProcessingStep can accept job arguments.

I currently have a python script contaning a function to be executed via ProcessingStep that requires arguments to be parsed in. I am not sure how I can extract the arguments from the 'Job arguments' such that I can call the function in the python script with the arguments.

Here is an example of code snippet from my python script:

def params(input_params):
    details = {"database": input_params[0],
         "table": input_params[1], 
         "catalog": input_params[2], 
         "earliestday": int(input_params[3]), 
         "latestday": int(input_params[4]),
         "s3bucket": input_params[5], 
         "bucketpath": input_params[6]}
    return details

output_params = params(input_params) #this is where I'm not sure how I can extract the argument from the job arguments in the ProcessingStep to call my function here

Here's what my processing step code looks like:

step_params = ProcessingStep(
    name="StateParams",
    processor=sklearn_processor, 
    outputs = [processing_output],
    job_arguments = ["ABC", "SESSION_123", "AwsDataCatalog", "5", "7", "mybucket", "bucket2/tmp/athena_sagemaker"],   #This is the job argument I input which I hope will be parsed into my python file function
    code = "params.py",
)

Would greatly appreciate if any of you can advise me on how I can go about using the job arguments in the ProcessingStep to successfully call the function in the python script, thanks!

Upvotes: 1

Views: 3236

Answers (1)

user_5
user_5

Reputation: 576

You can pass the job argument to ProcessingStep like this:

sample_argument_1 = "sample_arg_1"
sample_argument_2 = "sample_arg_2"

step_params = ProcessingStep(
    name="StateParams",
    processor=sklearn_processor, 
    outputs = [processing_output],
    job_arguments = ["--sample-argument-1", sample_argument_1,  "--sample-argument-2", sample_argument_2], 
    code = "params.py",
)

In your params.py file you can use the arguments like this:

import argparse

parser = argparse.ArgumentParser()
parser.add_argument('--sample-argument-1', type=str, dest='sample_argument_1')
parser.add_argument('--sample-argument-2', type=str, dest='sample_argument_2')
    
args = parser.parse_args()
arg_1 = args.sample_argument_1
print("arg_1:")
print(arg_1)

Upvotes: 3

Related Questions