sedavidw
sedavidw

Reputation: 11691

Pass multiple inputs into Map State in AWS Step Function

I am trying to use AWS Step Functions to trigger operations many S3 files via Lambda. To do this I am invoking a step function with an input that has a base S3 key of the file and part numbers each file (each parallel iteration would operate on a different S3 file). The input looks something like

    {
      "job-spec": {
        "base_file_name": "some_s3_key-",
        "part_array": [
          "part-0000.tsv",
          "part-0001.tsv",
          "part-0002.tsv", ...
        ]
      }
    }

My Step function is very simple, takes that input and maps it out, however I can't seem to get both the file and the array as input to my lambda. Here is my step function definition

    {
      "Comment": "An example of the Amazon States Language using a map state to process elements of an array with a max concurrency of 2.",
      "StartAt": "Map",
      "States": {
        "Map": {
          "Type": "Map",
          "ItemsPath": "$.job-spec",
          "ResultPath": "$.part_array",
          "MaxConcurrency": 2,
          "Next": "Final State",
          "Iterator": {
            "StartAt": "My Stage",
            "States": {
              "My Stage": {
                "Type": "Task",
                "Resource": "arn:aws:states:::lambda:invoke",
                "Parameters": {
                  "FunctionName": "arn:aws:lambda:us-east-1:<>:function:some-lambda:$LATEST",
                  "Payload": {
                    "Input.$": "$.part_array"
                  }
                },
                "End": true
              }
            }
          }
        },
        "Final State": {
          "Type": "Pass",
          "End": true
        }
      }
    }

As written above it complains that that job-spec is not an array for the ItemsPath. If I change that to $.job-spec.array I get the array I'm looking for in my lambda but the base key is missing.

Essentially I want each python lambda to get the base file key and one entry from the array to stitch together the complete file name. I can't just put the complete file names in the array due to the limit limit of how much data I can pass around in Step Functions and that also seems like a waste of data

It looks like the Parameters value can be used for this but I can't quite get the syntax right

Upvotes: 29

Views: 24555

Answers (2)

Wesley Cheek
Wesley Cheek

Reputation: 1696

This was a real PITA. Here is an example with AWS CDK:

const mapBlock = new sfn.Map(this, "processLoop", {

// Pick your array path
  itemsPath: sfn.JsonPath.stringAt("$.uploadedFiles"),

// Use this to manipulate data going into each loop
  parameters: {

// Now we can use $$.Map.Item.Value to get the current item value
    item: sfn.JsonPath.stringAt("$$.Map.Item.Value"),

// Any additional info you want from the map block input
    collection: sfn.JsonPath.stringAt("$.collectionName"),
    bucket: sfn.JsonPath.stringAt("$.bucket"),
  },
});

Upvotes: 2

sedavidw
sedavidw

Reputation: 11691

Was able to finally get the syntax right.

"ItemsPath": "$.job-spec.part_array",
"Parameters": {
  "part_name.$": "$$.Map.Item.Value",
  "base_file_name.$": "$.job-spec.base_file_name"
},

It seems that Parameters can be used to create custom inputs for each stage. The $$ is accessing the context of the stage and not the actual input. It appears that ItemsPath takes the array and puts it into a context which can be used later.

UPDATE Here is some AWS Documentation showing this being used from the comments below

Upvotes: 52

Related Questions