Reputation: 191
I have a particular workflow where I want to pass a list of 500 json strings from a lambda function to a step function (stepFunction1
), and then iterate over the list in that step function's map state. From there, I want to pass each item in the list to a separate step function (stepFunction2
) where additional work will be done.
My problem is that my list of 500 json strings exceeds the AWS service limit when passed to stepFunction1
. I have tried splitting up the list into several smaller segments, but this leads to several invocations of stepFunction1
running concurrently, which I can't have due to other limitations. My next idea was to try and store the list of json strings on an S3 bucket, access it from stepFunction1
, and then iterate through it from there. Is there any way to achieve this? Is it possible to read a file in S3 from an AWS state machine? I'm a bit stumped here.
Upvotes: 10
Views: 16827
Reputation: 151
You can use GetObject S3 API. It can read your JSON file stored in S3 as a string under Body entity of the state output, so you can then convert it to JSON at ResultSelector with Intrinsic function States.JsonToString
, like "myJson.$": "States.StringToJson($.Body)"
.
The code example could be:
{
"StartAt": "GetObject",
"States": {
"GetObject": {
"Type": "Task",
"Parameters": {
"Bucket": "<YOUR S3 Bucket Name>",
"Key": "<YOUR JSON File Name>"
},
"Resource": "arn:aws:states:::aws-sdk:s3:getObject",
"End": true,
"ResultSelector": {
"myJson.$": "States.StringToJson($.Body)"
}
}
},
"Comment": "S3 -> JSON",
"TimeoutSeconds": 60
}
Upvotes: 15
Reputation: 99
If you want to get the object based on an s3 event. You can use something like
{
"StartAt": "GetObject",
"States": {
"GetObject": {
"Type": "Task",
"Parameters": {
"Bucket.$": "$.detail.bucket.name",
"Key.$": "$.detail.object.key"
},
"Resource": "arn:aws:states:::aws-sdk:s3:getObject",
"End": true,
"ResultSelector": {
"myJson.$": "States.StringToJson($.Body)"
}
}
},
"Comment": "S3 -> JSON",
"TimeoutSeconds": 60
}
Upvotes: 0
Reputation: 901
In addition to Kotaro Doi's solution, I just want to add the option how to use input from the Execution of the Step Function as Parameter inputs for the Bucket name and object key.
For example if you trigger the Step Function based on a specific event (S3 file upload in my case) you have the following input:
{
"Records": [
{
"eventVersion": "2.1",
"eventSource": "aws:s3",
"awsRegion": "eu-central-1",
"eventTime": "2022-07-14T11:05:25.410Z",
"eventName": "ObjectCreated:Put",
"userIdentity": {
"principalId": "not-relevant"
},
"requestParameters": {
"sourceIPAddress": "127.0.0.1"
},
"responseElements": {
"x-amz-request-id": "",
"x-amz-id-2": ""
},
"s3": {
"s3SchemaVersion": "1.0",
"configurationId": "not-relevant",
"bucket": {
"name": "s3-bucket-name",
"ownerIdentity": {
"principalId": "not-relevant"
},
"arn": "arn:aws:s3:::s3-bucket-name"
},
"object": {
"key": "path/to/json/file.json",
"size": 92,
"eTag": "not-relevant",
"versionId": "not-relevant",
"sequencer": "not-relevant"
}
}
}
]
}
With that input you have already the bucket name available, its ARN and also the required object key. Accordingly, you can reference them in the parameters section in the following way:
{
"StartAt": "GetObject",
"States": {
"GetObject": {
"Type": "Task",
"Parameters": {
"Bucket.$": "$$.Execution.Input['Records'][0]['s3']['bucket']['name']",
"Key.$": "$$.Execution.Input['Records'][0]['s3']['object']['key']"
},
"Resource": "arn:aws:states:::aws-sdk:s3:getObject",
"End": true,
"ResultSelector": {
"myJson.$": "States.StringToJson($.Body)"
}
}
},
"Comment": "S3 -> JSON",
"TimeoutSeconds": 60
}
Upvotes: 1
Reputation: 3217
Step functions works very well with AWS Lambda functions, you could design a nice workflow easily.
You could read S3 from a lambda. In the end your lambda could work separately and be part of a step function.
I would advise you first create a single lambda function, read and process the s3 file and later try with stepfunction if it fits in your scenario.
Upvotes: -1
Reputation: 51654
One solution is to store the items in an Amazon DynamoDB table and directly access them from AWS Step Functions.
Here's an example how to retrieve an item from DynamoDB:
"Read Next Message from DynamoDB": {
"Type": "Task",
"Resource": "arn:aws:states:::dynamodb:getItem",
"Parameters": {
"TableName": "MyTable",
"Key": {
"MessageId": {"S.$": "$.List[0]"}
}
},
"ResultPath": "$.DynamoDB",
"Next": "Do something"
}
You can find more information about calling DynamoDB APIs with Step Functions in the documentation.
Upvotes: -1