Costin
Costin

Reputation: 3029

Better/best approach to load huge CSV file into DynamoDb

I have a huge .csv file on my local machine. I want to load that data in a DynamoDB (eu-west-1, Ireland). How would you do that?

  1. My first approach was:

    • Iterate the CSV file locally
    • Send a row to AWS via a curl -X POST -d '<row>' .../connector/mydata
    • Process the previous call within a lambda and write in DynamoDB

    I do not like that solution because:

    • There are too many requests
    • If I send data without the CSV header information I have to hardcode the lambda
    • If I send data with the CSV header there is too much traffic
  2. I was also considering putting the file in an S3 bucket and process it with a lambda, but the file is huge and the lambda's memory and time limits scare me.

  3. I am also considering doing the job on an EC2 machine, but I lose reactivity (if I turn off the machine while not used) or I lose money (if I do not turn off the machine).

  4. I was told that Kinesis may be a solution, but I am not convinced.

Please tell me what would be the best approach to get the huge CSV file in DynamoDB if you were me. I want to minimise the workload for a "second" upload.

I prefer using Node.js or R. Python may be acceptable as a last solution.

Upvotes: 2

Views: 6397

Answers (2)

Dee
Dee

Reputation: 11

If all your data is in S3 you can use AWS Data pipeline's predefined template to 'import DynamoDB data from S3' It should be straightforward to configure.

Upvotes: 1

E.J. Brennan
E.J. Brennan

Reputation: 46859

If you want to do it the AWS way, then data pipelines may be the best approach:

Here is a tutorial that does a bit more than you need, but should get you started:

The first part of this tutorial explains how to define an AWS Data Pipeline pipeline to retrieve data from a tab-delimited file in Amazon S3 to populate a DynamoDB table, use a Hive script to define the necessary data transformation steps, and automatically create an Amazon EMR cluster to perform the work.

http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-importexport-ddb-part1.html

Upvotes: 3

Related Questions