Reputation: 21
I have multiple large csv files in a S3 bucket. I want to write their data to a dynamoDB table. The issue is my function runs for more than 15 minutes and get the timeout error without completely writing the csv filevto DynamoDB. So is there a way to split the csv into smaller parts?
Things I've tried so far
this - This doesn't invoke itself as it is supposed to be(writes a few lines to the table then stops without any errors.
aws document - Gives s3fs module not found error. Tried many many things to make it work but couldn't.
Is there anyway I can do my task?
Thank You
Upvotes: 1
Views: 667
Reputation: 21
I could fix my problem (partly) by increasing the writing capacity on dynamodb to 1000 minimum. I could write 1 million records in 10 minutes. Still I needed to split the csv file. Also using batch_write instead of writing each item line by line helps tremendously.
Upvotes: 1
Reputation: 2717
I think the fan-out approach from your linked solution should be the best option.
Take a main lambda function which will split the processing by dividing the number of lines (e.g. 1000 Lines each) into fan-out calls for your processing lambda, which will be invoked with Event
instead of Tail
. The processing lambdas should then only read the CSV lines assigned to it (have a look here).
If you already tried this, could you probably post parts of your solution?
Upvotes: 2