A.S.
A.S.

Reputation: 191

Is there any way to read the contents of an S3 file from an AWS Step Function?

I have a particular workflow where I want to pass a list of 500 json strings from a lambda function to a step function (stepFunction1), and then iterate over the list in that step function's map state. From there, I want to pass each item in the list to a separate step function (stepFunction2) where additional work will be done.

My problem is that my list of 500 json strings exceeds the AWS service limit when passed to stepFunction1. I have tried splitting up the list into several smaller segments, but this leads to several invocations of stepFunction1 running concurrently, which I can't have due to other limitations. My next idea was to try and store the list of json strings on an S3 bucket, access it from stepFunction1, and then iterate through it from there. Is there any way to achieve this? Is it possible to read a file in S3 from an AWS state machine? I'm a bit stumped here.

Upvotes: 10

Views: 16827

Answers (5)

Kotaro Doi
Kotaro Doi

Reputation: 151

You can use GetObject S3 API. It can read your JSON file stored in S3 as a string under Body entity of the state output, so you can then convert it to JSON at ResultSelector with Intrinsic function States.JsonToString, like "myJson.$": "States.StringToJson($.Body)".

The code example could be:

{
  "StartAt": "GetObject",
  "States": {
    "GetObject": {
      "Type": "Task",
      "Parameters": {
        "Bucket": "<YOUR S3 Bucket Name>",
        "Key": "<YOUR JSON File Name>"
      },
      "Resource": "arn:aws:states:::aws-sdk:s3:getObject",
      "End": true,
      "ResultSelector": {
        "myJson.$": "States.StringToJson($.Body)"
      }
    }
  },
  "Comment": "S3 -> JSON",
  "TimeoutSeconds": 60
}

Upvotes: 15

lexhuismans
lexhuismans

Reputation: 99

If you want to get the object based on an s3 event. You can use something like

{
  "StartAt": "GetObject",
  "States": {
    "GetObject": {
      "Type": "Task",
      "Parameters": {
        "Bucket.$": "$.detail.bucket.name",
        "Key.$": "$.detail.object.key"
      },
      "Resource": "arn:aws:states:::aws-sdk:s3:getObject",
      "End": true,
      "ResultSelector": {
        "myJson.$": "States.StringToJson($.Body)"
      }
    }
  },
  "Comment": "S3 -> JSON",
  "TimeoutSeconds": 60
}

Upvotes: 0

Michael Aicher
Michael Aicher

Reputation: 901

In addition to Kotaro Doi's solution, I just want to add the option how to use input from the Execution of the Step Function as Parameter inputs for the Bucket name and object key.
For example if you trigger the Step Function based on a specific event (S3 file upload in my case) you have the following input:

{
  "Records": [
    {
      "eventVersion": "2.1",
      "eventSource": "aws:s3",
      "awsRegion": "eu-central-1",
      "eventTime": "2022-07-14T11:05:25.410Z",
      "eventName": "ObjectCreated:Put",
      "userIdentity": {
        "principalId": "not-relevant"
      },
      "requestParameters": {
        "sourceIPAddress": "127.0.0.1"
      },
      "responseElements": {
        "x-amz-request-id": "",
        "x-amz-id-2": ""
      },
      "s3": {
        "s3SchemaVersion": "1.0",
        "configurationId": "not-relevant",
        "bucket": {
          "name": "s3-bucket-name",
          "ownerIdentity": {
            "principalId": "not-relevant"
          },
          "arn": "arn:aws:s3:::s3-bucket-name"
        },
        "object": {
          "key": "path/to/json/file.json",
          "size": 92,
          "eTag": "not-relevant",
          "versionId": "not-relevant",
          "sequencer": "not-relevant"
        }
      }
    }
  ]
}

With that input you have already the bucket name available, its ARN and also the required object key. Accordingly, you can reference them in the parameters section in the following way:

{
  "StartAt": "GetObject",
  "States": {
    "GetObject": {
      "Type": "Task",
      "Parameters": {
        "Bucket.$": "$$.Execution.Input['Records'][0]['s3']['bucket']['name']",
        "Key.$": "$$.Execution.Input['Records'][0]['s3']['object']['key']"
      },
      "Resource": "arn:aws:states:::aws-sdk:s3:getObject",
      "End": true,
      "ResultSelector": {
        "myJson.$": "States.StringToJson($.Body)"
      }
    }
  },
  "Comment": "S3 -> JSON",
  "TimeoutSeconds": 60
}

Upvotes: 1

Traycho Ivanov
Traycho Ivanov

Reputation: 3217

Step functions works very well with AWS Lambda functions, you could design a nice workflow easily.

You could read S3 from a lambda. In the end your lambda could work separately and be part of a step function.

I would advise you first create a single lambda function, read and process the s3 file and later try with stepfunction if it fits in your scenario.

Upvotes: -1

Dennis Traub
Dennis Traub

Reputation: 51654

One solution is to store the items in an Amazon DynamoDB table and directly access them from AWS Step Functions.

Here's an example how to retrieve an item from DynamoDB:

"Read Next Message from DynamoDB": {
  "Type": "Task",
  "Resource": "arn:aws:states:::dynamodb:getItem",
  "Parameters": {
    "TableName": "MyTable",
    "Key": {
      "MessageId": {"S.$": "$.List[0]"}
    }
  },
  "ResultPath": "$.DynamoDB",
  "Next": "Do something"
}

You can find more information about calling DynamoDB APIs with Step Functions in the documentation.

Upvotes: -1

Related Questions