Alex Gordon
Alex Gordon

Reputation: 60871

getting started with the latest version of hadoop and ec2

I am following Hadoop in Action to get started with hadoop with ec2. I'm running on ubuntu and have downloaded and installed the latest version of Hadoop. I am hitting a road block at this command:

hadoop-ec2 launch-cluster mycluster 2

The book says "The Hadoop EC2 tools are in the directory src/contrib/ec2/bin under your Hadoop installation. Recall that our ec2-init.sh script has already added that directory to your system PATH. Within that directory is hadoop-ec2, which is a meta-command for executing other commands. To launch a Hadoop Cluster on ec2 use:

hadoop-ec2 launch-cluster < cluster-name> < number-of-slaves>"


The response I get is: hadoop-ec2: command not found

I noticed that the variable $HADOOP_HOME is not set.

It looks like this book is out-dated.

  1. can someone direct me to a tutorial that was created in the last couple of months on how to set up hadoop with ec2?
  2. After some quick googling, it seems that HADOOP_HOME is deprecated. Is this true?
  3. I am able to without problems execute ec2-describe-images. and get all the available images that I can use. Why doesn't hadoop-ec2 command work?

Thank you for your guidance.

Upvotes: 1

Views: 729

Answers (1)

Steffen Opel
Steffen Opel

Reputation: 64761

Unfortunately the dedicated page Running Hadoop on Amazon EC2 (which doesn't facilitate HADOOP_HOME indeed) turns out to be fairly out of date in itself and doesn't seem to apply to the most recent stable version anymore (1.0.4 at the time of this writing). I'm not aware of an updated 'native' tutorial, but apparently users are quite happy with an approach via Apache Whirr (which incidentally started out in 2007 as some bash scripts in Apache Hadoop for running Hadoop clusters on EC2).

Accordingly there is a Getting Started with Whirr™ available, in addition there are also related 3rd party tutorials, e.g.:

I hope you'll be able to merge the information in the book about using Apache Hadoop with these about running a Hadoop cluster via Apache Whirr - good luck!

Upvotes: 1

Related Questions