Mehmet Ataş
Mehmet Ataş

Reputation: 11559

Why is DynamoDBStream not triggering lambda function in parallel?

I have this setup

ApiGateway -> Lambda1 -> DynamoDB -> Lambda2 -> SNS -> SQS

Here is what am I trying to do:

  1. Make an http request to ApiGateway.
  2. ApiGateway is integrated to Lambda1, so Lambda1 gets executed.
  3. Lambda1 inserts an object to DynamoDB.
  4. DynamoDBStream triggers Lambda2. Batch size is 100.
  5. Lambda2 publishes a message to SNS for every inserted record.
  6. SQS is subscribed to SNS.

Basically, if I make an http request to Api Gateway I expect to see a message ending up in SQS. Actually, for a single request everything works as expected.

I made this test:

  1. Make 10 http request to warmup lambda functions and wait for 30 seconds.
  2. Create 100 threads. Each thread will make an http request until total request number is 10000.

2nd step of the test completes in 110 seconds. My DynamoDB is configured for 100 writes per second and this 110 seconds makes perfect sense. After 110 seconds I see these 10000 records in my DynamoDB table

The problem is that it takes too much time for messages to end up in SQS. I checked the logs of Lambda2 and I see that it still gets triggered 30 mins after the test completes. Also in the logs of Lambda2 I see this pattern.

Start Request
Message published to SNS...
Message published to SNS...
[98 more "Message published to SNS..."]
End Request

Logs consist of repetition of these lines. 100 lines of "message published" makes sense because the DynamoDBStream is configured with Batch Size of 100. Each request to Lambda2 takes 50-60 seconds which means it will take ~90 mins for all messages to end up in SQS.

What bothers me is that, every "Start Request" comes after an "End Request". So, root cause seems like DynamoDBStream is not triggering Lambda2 in parallel.

Question

Why is DynamoDBStream not triggering lambda function in parallel? Am I missing a configuration?

Solution

After taking the advice from the answer and comment here is my solution.

  1. I was re-creating SNS client before publishing each message. I made it a static variable in my class and Lambda2 started executing in ~15 seconds.
  2. Then, I increased batch size of DynamoDB trigger to 1000.
  3. Inside Lambda2 I processed (publish to SNS) DynamoDB records using 10 threads in parallel.
  4. Increased Lambda2 memory allocation from 192MB to 512MB.

With these optimizations I can see all 10000 messages in SQS, 10-15 seconds after all http requests were sent.

Conclusion :)

In order to find the optimum (cheap & acceptable latency) solution, we need to make several tests with different batch size, number of threads, allocated memory etc.

Upvotes: 1

Views: 405

Answers (1)

Kannaiyan
Kannaiyan

Reputation: 13055

There is no way as of now to trigger DynamoDBStream to trigger in parallel. It is only a sequential delivery and in batch configured.

There is no partial delivery also. If you have a batch delivering to your lambda, you need to complete all the elements in batch. Otherwise it will deliver the same batch or with more records later.

Also you need to complete lambda successfully for next batch, if that errors out, it will call the lambda repeatedly until it gets delivered successfully or the lifetime of the data in the stream.

Upvotes: 2

Related Questions