Reputation: 261

Processing huge csv file from aws s3 to database

I have a csv file consisting of 2M records which would be uploaded to AWS S3 once or twice every day.I need to dump this file in our database which can at time handle approximately ~1K records OR ~40-50k/min using batch upload.

I was planning to use AWS lambda but since it has timeout of 15min I would only be able to insert ~0.7M records.I also read that we can invoke another lambda function with new offset but I am looking to process this file at a stretch.

What should be my ideal approach for such scenarios.Should I spin up an EC2 instance for handling batch uploads ?

Any help would be appreciated

Upvotes: 0

Answers (2)

raupach

Reputation: 3102

Why don't you have one lambda running through the file and inserting the records into SQS? Pretty sure this takes less than 15 minutes. A second Lambda consumes the records from SQS and inserts them into the database. This way you don't risk overloading your database since the lambda won't retrieve more than 10 records from the queue.

Of course this is one solution of many.

Upvotes: 1

jarmod

Reputation: 78703

Consider using Database Migration Service.

You can migrate data from an Amazon S3 bucket to a database using AWS DMS. The source data files must be in comma-separated value (.csv) format.

Upvotes: 2

Processing huge csv file from aws s3 to database

Answers (2)

Related Questions