Nayn
Nayn

Reputation: 3614

On demand slave generation in Hadoop cluster on EC2

I am planning to use Hadoop on EC2. Since we have to pay per instance usage, it is not good to have fixed number of instances than what are actually required for the job.

In our application, many jobs are executed concurrently and we do not know the slave requirement all the time. Is it possible to start the hadoop cluster with minimum slaves and later on manage the availability based on requirement?

i.e. create/destroy slaves on demand

Sub question: Can hadoop cluster manage multiple jobs concurrently?

Thanks

Upvotes: 1

Views: 258

Answers (3)

Andrei Savu
Andrei Savu

Reputation: 8685

Just want to let you know that we are doing some work on this in Apache Whirr. We are tracking progress in WHIRR-214. Vote or join development. :)

Upvotes: 0

Nayn
Nayn

Reputation: 3614

This seems promising http://hadoop.apache.org/common/docs/r0.17.1/hod.html

Upvotes: 0

Dmytro Molkov
Dmytro Molkov

Reputation: 11

The default scheduler that is used in hadoop is a simple FIFO one, you can look into using FairScheduler which assigns a share of the cluster to each of the running jobs and has extensive configuration to control those shares.

As far as EC2 is concerned - you can easily start of with some number of nodes and then once you see that there are too many tasks in the queue and all the slots in the cluster are occupied - add more of them. You will simply have to start up an instance and launch a task tracker on it that will register with the jobtracker.

However you will have to have your own system that will manage startup and shutdown of these nodes.

Upvotes: 1

Related Questions