Reputation: 11691
I am trying to use AWS Step Functions to trigger operations many S3 files via Lambda. To do this I am invoking a step function with an input that has a base S3 key of the file and part numbers each file (each parallel iteration would operate on a different S3 file). The input looks something like
{
"job-spec": {
"base_file_name": "some_s3_key-",
"part_array": [
"part-0000.tsv",
"part-0001.tsv",
"part-0002.tsv", ...
]
}
}
My Step function is very simple, takes that input and maps it out, however I can't seem to get both the file and the array as input to my lambda. Here is my step function definition
{
"Comment": "An example of the Amazon States Language using a map state to process elements of an array with a max concurrency of 2.",
"StartAt": "Map",
"States": {
"Map": {
"Type": "Map",
"ItemsPath": "$.job-spec",
"ResultPath": "$.part_array",
"MaxConcurrency": 2,
"Next": "Final State",
"Iterator": {
"StartAt": "My Stage",
"States": {
"My Stage": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "arn:aws:lambda:us-east-1:<>:function:some-lambda:$LATEST",
"Payload": {
"Input.$": "$.part_array"
}
},
"End": true
}
}
}
},
"Final State": {
"Type": "Pass",
"End": true
}
}
}
As written above it complains that that job-spec
is not an array for the ItemsPath
. If I change that to $.job-spec.array
I get the array
I'm looking for in my lambda but the base key
is missing.
Essentially I want each python lambda to get the base file key and one entry from the array to stitch together the complete file name. I can't just put the complete file names in the array due to the limit limit of how much data I can pass around in Step Functions and that also seems like a waste of data
It looks like the Parameters
value can be used for this but I can't quite get the syntax right
Upvotes: 29
Views: 24555
Reputation: 1696
This was a real PITA. Here is an example with AWS CDK
:
const mapBlock = new sfn.Map(this, "processLoop", {
// Pick your array path
itemsPath: sfn.JsonPath.stringAt("$.uploadedFiles"),
// Use this to manipulate data going into each loop
parameters: {
// Now we can use $$.Map.Item.Value to get the current item value
item: sfn.JsonPath.stringAt("$$.Map.Item.Value"),
// Any additional info you want from the map block input
collection: sfn.JsonPath.stringAt("$.collectionName"),
bucket: sfn.JsonPath.stringAt("$.bucket"),
},
});
Upvotes: 2
Reputation: 11691
Was able to finally get the syntax right.
"ItemsPath": "$.job-spec.part_array",
"Parameters": {
"part_name.$": "$$.Map.Item.Value",
"base_file_name.$": "$.job-spec.base_file_name"
},
It seems that Parameters
can be used to create custom inputs for each stage. The $$
is accessing the context of the stage and not the actual input. It appears that ItemsPath
takes the array and puts it into a context which can be used later.
UPDATE Here is some AWS Documentation showing this being used from the comments below
Upvotes: 52