Ocean Sun
Ocean Sun

Reputation: 101

How to use Map function of Step Functions to parse a JSON array inside a JSON file

I am trying to analyze data inside a huge JSON file (about 4 GB), and my current solution is:

  1. Set up an AWS EventBridge rule that once a JSON file uploaded to an S3 bucket, this rule will trigger the Step Functions.

  2. The only step in Step Functions is Map (contains a Lambda) that it can get the bucket name & key from the EventBridge and then get the items from JSON array (I just drag and drop the "Process JSON file in S3" from Patterns in Workflow Studio Portal).

    I also set up the ItemsPath property in ASL to indicate that the path of JSON array inside the JSON file.

    The ASL so far looks like:

    {
      "Comment": "A description of my state machine",
      "StartAt": "Json File Analysis",
      "States": {
        "Json File Analysis": {
          "Type": "Map",
          "ItemsPath": "$.users",
          "ItemProcessor": {
            "ProcessorConfig": {
              "Mode": "DISTRIBUTED",
              "ExecutionType": "STANDARD"
            },
            "StartAt": "Decode Json Node",
            "States": {
              "Decode Json Node": {
                "Type": "Task",
                "Resource": "arn:aws:states:::lambda:invoke",
                "OutputPath": "$.Payload",
                "Parameters": {
                  "Payload.$": "$",
                  "FunctionName": "arn:aws:lambda:eu-east-1:1234567890:function:user-data-analysis-lambda:$LATEST"
                },
                "Retry": [
                  {
                    "ErrorEquals": [
                      "Lambda.ServiceException",
                      "Lambda.AWSLambdaException",
                      "Lambda.SdkClientException",
                      "Lambda.TooManyRequestsException"
                    ],
                    "IntervalSeconds": 1,
                    "MaxAttempts": 3,
                    "BackoffRate": 2
                  }
                ],
                "End": true
              }
            }
          },
          "ItemReader": {
            "Resource": "arn:aws:states:::s3:getObject",
            "ReaderConfig": {
              "InputType": "JSON"
            },
            "Parameters": {
              "Bucket.$": "$.detail.bucket.name",
              "Key.$": "$.detail.object.key"
            }
          },
          "MaxConcurrency": 1000,
          "Label": "JsonFileAnalysis",
          "End": true
        }
      }
    }
    

However after I did the testing, I found the strange situations:

  1. If I uploaded a JSON file and it structure looks like:

    {
        "datetime": "datetime",
        "users": [{UserDataJsonObject}, {UserDataJsonObject}, {UserDataJsonObject}...]
    }
    

    The execution of Step Functions failed with the error: Attempting to map over non-iterable node.

  2. If I uploaded the JSON file and it is a JSON array like:

    [
        {UserDataJsonObject},
        {UserDataJsonObject},
        {UserDataJsonObject}
        ...
    ]
    

    This Step Function can work.

I am confused why since I have already set up the ItemsPath. And the structure of JSON file is unchangeable. How can I resolve this issue?

Upvotes: 0

Views: 1575

Answers (1)

RJ Girish
RJ Girish

Reputation: 1

It seems like the issue you are facing is related to the structure of the JSON file and how Step Functions is interpreting it. The "Attempting to map over non-iterable node" error suggests that Step Functions is expecting an iterable node at the specified ItemsPath, but it's not finding one. In the first case, where the JSON file has a structure like:

{ "datetime": "datetime", "users": [{UserDataJsonObject}, {UserDataJsonObject}, {UserDataJsonObject}...] }

Step Functions is expecting an iterable node at the path specified by "ItemsPath": "$.users", but it seems that it's not finding one, which is causing the error. In the second case, where the JSON file is a JSON array like:

[
    {UserDataJsonObject},
    {UserDataJsonObject},
    {UserDataJsonObject}
    ...
]

Step Functions is able to work because it finds an iterable node at the root of the JSON array. To resolve this issue, you may need to modify the structure of the JSON file or adjust the ItemsPath in your Step Functions state machine to correctly point to the iterable node. If the structure of the JSON file is unchangeable, you may need to preprocess the JSON file before passing it to Step Functions to ensure that the ItemsPath points to the correct iterable node. You may also want to consider using a different approach to process the JSON file, such as using a custom Lambda function to handle the JSON parsing and processing, especially if the structure of the JSON file is fixed and cannot be modified. I hope this helps you resolve the issue! If you have any further questions or need additional assistance, feel free to ask.

Upvotes: -2

Related Questions