Reputation: 3029
I have a huge .csv
file on my local machine. I want to load that data in a DynamoDB (eu-west-1, Ireland). How would you do that?
My first approach was:
curl -X POST -d '<row>' .../connector/mydata
I do not like that solution because:
I was also considering putting the file in an S3
bucket and process it with a lambda
, but the file is huge and the lambda's memory and time limits scare me.
I am also considering doing the job on an EC2 machine, but I lose reactivity (if I turn off the machine while not used) or I lose money (if I do not turn off the machine).
I was told that Kinesis may be a solution, but I am not convinced.
Please tell me what would be the best approach to get the huge CSV
file in DynamoDB if you were me. I want to minimise the workload for a "second" upload.
I prefer using Node.js or R. Python may be acceptable as a last solution.
Upvotes: 2
Views: 6397
Reputation: 11
If all your data is in S3 you can use AWS Data pipeline's predefined template to 'import DynamoDB data from S3' It should be straightforward to configure.
Upvotes: 1
Reputation: 46859
If you want to do it the AWS way, then data pipelines may be the best approach:
Here is a tutorial that does a bit more than you need, but should get you started:
The first part of this tutorial explains how to define an AWS Data Pipeline pipeline to retrieve data from a tab-delimited file in Amazon S3 to populate a DynamoDB table, use a Hive script to define the necessary data transformation steps, and automatically create an Amazon EMR cluster to perform the work.
http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-importexport-ddb-part1.html
Upvotes: 3