SunnyAk
SunnyAk

Reputation: 588

Is Lambda right choice when setting up CDC application

I am trying to implement CDC against an transaction database in AWS and load the same into a Snowflake database. I was able to capture changes from a Postgres database to S3 bucket.

These are the next steps - 1) Set up event notifications against the S3 bucket and trigger a lambda (#1) 2) Lambda #1 will create a message that contains data associated with a file and a Publish timestamp and send this message to FIFO SQS queue, which will trigger another lambda (#2) 3) Lambda #2 will run COPY INTO statements, that will write data into Snowflake

This is a question that came into mind - When Lambda #2 gets triggered, if there is a failure for a batch of records, how can we stop invoking future invocations ? As this a CDC application, we will have to maintain the order of transactional changes from the source database to Snowflake. Is Lambda #2 the right choice here ?

Upvotes: 1

Views: 1559

Answers (2)

SunnyAk
SunnyAk

Reputation: 588

This was another answer from mailtobash with regards to using lambda

" If there is an existing EC2 that you can leverage by adding a python job, it is probably a simpler design. As a practice, we use some Lambda when it is the only technological choice available - such as custom authorizers on API gateway, infrequently used services, events on S3, etc. If you have a running set of EC2s that you maintain, then you may not need to rely on Lambdas "

Upvotes: 0

mailtobash
mailtobash

Reputation: 2477

There are some considerations to make when employing Lambda functions to capture CDC events.The volume of data that will be migrated.
For data changes that are significant - there are Lambda limits that you need to be aware of such as 3GB memory, 512MB storage etc. If the data happens to be large, this could lead to failures of the lambda function.

Now for your question - Lambda #2 that does COPY INTO - you may want it to track some state that indicates what was the last message id, timestamp , etc that was processed. So that the subsequent lambda functions do not get triggered when there is a failure. So your lambda2 could look something like :
1. Get SQS event details - check in the dynamo or RDS table if the same event has been processed before or failed before. Also check if there are any past events in failed state. Trigger an alarm if it has or take remediation steps. 2. Store the event into a dynamo DB or an RDS table .
3. Perform the processing .
4. Update the table on success / failure.

Upvotes: 2

Related Questions