Reputation: 13
I am using the step functions data science SDK using python
. I have a task that runs every day and the path of the data that is to be accessed in certain steps of the step functions keeps changing every day as it has the date parameter.
How can I pass the date parameter when I execute the step function and use it so that I can access new data every day automatically.
This is an example of a step I am adding to the workflow.
etl_step = steps.GlueStartJobRunStep(
'Extract, Transform, Load',
parameters={"JobName": execution_input['GlueJobName'],
"Arguments":{
'--S3_SOURCE': data_source,
'--S3_DEST': 's3a://{}/{}/'.format(bucket, project_name),
'--TRAIN_KEY': train_prefix + '/',
'--VAL_KEY': val_prefix +'/'}
}
)
I want to add the date variable to the S3_DEST. If I use execution_input, the type isn't string so I cannot concatenate it for the path.
Upvotes: 1
Views: 4446
Reputation: 35146
Edit
If the date is a datetime
object you can use datetime.strftime('%Y-%m-%d')` to output it as a string.
Original
Step functions support input into them.
If you're using the SDK for start_execution then you can use the input parameter.
If you have CloudWatch event you can specify a constant from the console.
Upvotes: 1