palash kulshreshtha
palash kulshreshtha

Reputation: 587

Need help setting hadoop cluster in college Lab

For my final year project i am required to set up a small hadoop cluster in my college lab. I have previously worked on hadoop but only in pseudo cluster. Now the task ahead of us is to install ubuntu in all the computers we have and then set up hadoop in each one of them . I am planning to do this using a custom iso of hadoop user in ubuntu 12.04.

What i am thinking of doing 1. Install the ISO in all the systems. 2. If 1 is done correctly then move to configuring hadoop on each system but this is tricky as all the ip of the lab computers are dynamic and keep changing . So is there any way i can bind the mac address of these pc and whenever these mac addresses come up master node can include them in cluster.

I am having a lot of uncertainties like

1.Is there a better way to do this. By automating some parts of this? 2. Am i better off using virtual box & hadoop iso in each machine ? 3. I am having some experience in hadoop 1 but hadoop 2 is out now so should i use hadoop 2 or should i go with hadoop 1.

Any suggestions? How should i proceed?

Upvotes: 0

Views: 517

Answers (2)

kishorer747
kishorer747

Reputation: 820

@palash kulshreshtha, I am also in your position, but I can have static IP's. I setup hadoop 2.4.1 cluster in 5 computers in my college for my project.

It is recommended that you install Ubuntu in all computers and do hadoop configuration for multi-node rather than using Cloudera's VM or even a Virtual machine for Ubuntu as virtual machine will reduce the speed.

As for as dynamic IP is concerned,if the device has communicated on the same network in the last 30 seconds (or whatever your arp timeout is set to) then you see both its IP and the MAC with this command arp -a in Ubuntu terminal.

You can write a simple python script to go through all hosts and find IP of the device with MAC Address you want.
For help on Hadoop installation, visit www.kishorer.in
Cheers.

Upvotes: 1

Lauri Peltonen
Lauri Peltonen

Reputation: 1542

Do also the internal IPs change all the time? As far as I know there is no easy way to use MAC addresses - you need to have some certainties in the environment. If the computers are connected to each other, they have to have the required information somewhere, for example in the hosts file.

If you have a lot of computers, I recommend automating the process with some puppet scripts or just bash scripts - I don't recommend virtual boxes, as virtualization will possibly bring you just more problems.

If your task is just to install and configure hadoop and you don't really need to do anything special with it, then just go with the version you are familiar with.

Upvotes: 2

Related Questions