Amazon EC2 and S3: How to read and write data

Question

I have just followed this guide: http://rogueleaderr.tumblr.com/post/32768181371/set-up-and-run-a-fully-distributed-hadoop-hbase-cluster to get a cluster set up on Amazon EC2 with hadoop and hbase running.

What I am wondering now is how do I actually get my data in the hbase running on my cluster? Do I need to load it into S3 then load it into my hbase cluster?

Is there a best practice for loading/extracting data? Any kind of pointers would be appreciated as I am new to EC2.

Daan · Accepted Answer

You'll want SSH into one of your nodes, and then you can copy the data to HDFS using something like:

hadoop fs -copyFromLocal data/sample_rdf.nt input/sample_rdf.nt

This copies the file from your local machine to HDFS. Of course, that assumes you've already got the file on your machine, so you'll have to upload it to EC2 first, or to get your EC2 node to download it from somewhere.

It can make sense to upload your file to S3 instead and to copy it down to your machine from S3 using s3cmd, depending on how often you will be destroying your nodes & whether you want to keep your file available for later use.

(There are some more examples in that tutorial you followed, in part III.)

Amazon EC2 and S3: How to read and write data

Answers (1)

Related Questions