Reputation: 999

Problems with installing Hadoop on Ubuntu 12.04

I just set up a new Ubuntu 12.04 VM (Virtualbox) and wanted to test Hadoop on it. I am following this guide: http://hadoop.apache.org/docs/r0.20.2/quickstart.html

I think I am doing something wrong with the java installation and the JAVA_HOME path... Right now bin/hadoop always just returns "command not found"

Where do I have to extract the hadoop folder?

Do I need to set up SSH before? What about SSHD?

What are the commands to install the correct java version?

What EXACTLY do I have to enter into the hadoop-env.sh file?

Thanks!

Upvotes: 0

Answers (3)

Prateek Mishra

Reputation: 1264

Installing Hadoop Hive Scoop and PIG

Follow the steps to install the above applications. Note : There is no need of extra user, you may work on existing system.

Download Haddop 2.7.1, PIG, Sqoop, Hive

 http://www.us.apache.org/dist/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz   

 http://www.us.apache.org/dist/pig/pig-0.13.0/pig-0.13.0.tar.gz      

 http://www.us.apache.org/dist/sqoop/1.4.6/ sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz   

 http://www.eu.apache.org/dist/hive/hive-1.2.1/apache-hive-1.2.1-bin.tar.gz

Extract in a folder say /home/mypc/hadoop-soft --> cd hadoop-soft

hive -->  /home/mypc/hadoop-soft/hive
sqoop --> /home/mypc/hadoop-soft/sqoop
pig   --> /home/mypc/hadoop-soft/pig
hadoop --> /home/mypc/hadoop-soft/hadoop

Make Sure you do not create any subfolder in these folder and are able to see respective bin folder.

Now Lets move these folders to /usr/lib

 sudo mkdir /usr/lib/hadoop

 sudo mv sqoop/ /usr/lib/hadoop/
 sudo mv pig/ /usr/lib/hadoop/
 sudo mv hive/ /usr/lib/hadoop/
 sudo mv hadoop-2.6/ /usr/lib/hadoop/

Edit .bashrc File to add Path : Add the Following line at the end of file

Remove Java_path Statment ,if any as we are updating it here.

Check if Java is installed and is present at the location mentioned below. If yes then fine, if not then you need to google install java n ubuntu
```
 sudo gedit ~/.bashrc
```

Add following lines in the end to .bashrc

     export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
     export HADOOP_HOME=/usr/lib/hadoop/hadoop-2.6
     export HIVE_HOME=/usr/lib/hadoop/hive
     export PIG_HOME=/usr/lib/hadoop/pig
     export SQOOP_HOME=/usr/lib/hadoop/sqoop

    export HADOOP_MAPRED_HOME=/usr/lib/hadoop/hadoop
    export HADOOP_COMMON_HOME=/usr/lib/hadoop/hadoop
    export HADOOP_HDFS_HOME=/usr/lib/hadoop/hadoop
    export HADOOP_YARN_HOME=/usr/lib/hadoop/hadoop  
    export HADOOP_CONF_DIR=/usr/lib/hadoop/hadoop/etc/hadoop

    export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin::$PIG_HOME/bin:$HIVE_HOME/bin:$SQOOP_HOME/bin

Save this file and close it. Now you may wanna run it so that updates are reflected.
```
source ~/.bashrc 
```

6.Create two dirctories namenode and datanode

cd /usr/lib
sudo mkdir hdfs
cd hdfs
sudo mkdir namenode
sudo mkdir datanode
sudo chmod 777 -R namenode
sudo chmod 777 -R datanode

Go to $HADOOP_HOME and edit some files.

cd $HADOOP_HOME
cd etc/hadoop/

A. sudo gedit yarn-site.xml : Add these lines inside < configuration> < /configuration>

<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

Save File and Close

B. sudo gedit core-site.xml : Add these lines inside < configuration> < /configuration>

<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>

Save File and Close.

C. sudo gedit hdfs-site.xml : Add these lines inside <~configuration> <~/configuration>

    <property>
    <name>dfs.replication</name>
    <value>1</value>
    </property>
    <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/usr/lib/hdfs/namenode</value>
    </property>
    <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/usr/lib/hdfs/datanode</value>
    </property>

Save File and Close

D. sudo gedit mapred-site.xml :Add these lines

     <?xml version="1.0"?>
      <configuration>
        <property>
            <name>mapreduce.framework.name</name>
            <value>yarn</value>
     </property> 
    </configuration>

Note : This will be a new file. - Save it and Close.

Format namenode hdfs namenode -format

Go to /usr/lib/hdfs and create start and stop scripts

cd /usr/lib/hdfs
sudo mkdir scripts
sudo chmod 777 -R scripts
cd scripts
sudo gedit hadoopstart.sh

Write these lines

    /usr/lib/hadoop/hadoop-2.6/sbin/hadoop-daemon.sh start namenode
    /usr/lib/hadoop/hadoop-2.6/sbin/hadoop-daemon.sh start datanode
    /usr/lib/hadoop/hadoop-2.6/sbin/yarn-daemon.sh start resourcemanager
    /usr/lib/hadoop/hadoop-2.6/sbin/yarn-daemon.sh start nodemanager
    /usr/lib/hadoop/hadoop-2.6/sbin/mr-jobhistory-daemon.sh start historyserver

Save it and close.
```
sudo gedit hadoopstop.sh
```

Write these lines

    /usr/lib/hadoop/hadoop-2.6/sbin/hadoop-daemon.sh stop namenode
    /usr/lib/hadoop/hadoop-2.6/sbin/hadoop-daemon.sh stop datanode
    /usr/lib/hadoop/hadoop-2.6/sbin/yarn-daemon.sh stop resourcemanager
    /usr/lib/hadoop/hadoop-2.6/sbin/yarn-daemon.sh stop nodemanager
    /usr/lib/hadoop/hadoop-2.6/sbin/mr-jobhistory-daemon.sh stop historyserver

-Save it and close it.

Run these files to start and stop hadoop in local mode

To start

     sh /usr/lib/hdfs/scripts/hadoopstart.sh

To stop

     sh /usr/lib/hdfs/scripts/hadoopstop.sh

Check if hadoop is running : After running start script

hadoop version
hadoopp fs -ls /

Open http://localhost:50070 to see if name node is running.

Run Various Serives using : On Terminal
```
Pig
sqoop
hive
```

Upvotes: 0

Ohadi

Reputation: 121

I used this great tutorial. Only change was that I installed a default Java6...

Michael Noll Tutorial for setting up Hadoop

Upvotes: 2

iTech

Reputation: 18460

The "command not found" error when running hadoop should not be related to JAVA_HOME. I believe you are not running this command from hadoop home directory (other alternative is to add the full path to hadoop/bin to your PATH).
You can extract hadoop folder anywhere you like
For hadoop-env.sh, you should set the JAVA_HOME variable to point to your Java installation home directory e.g. export JAVA_HOME=/home/jdk1.6.0/ change this path to reflect your environement
You will need SSH and SSHD especially if you will run Hadoop in distributed or pseudo-distributed environment.
Hadoop require Java 1.6+, just download jdk-7u9-linux-i586.tar.gz from here and follow the installation guide (it should not require more than just unzipping it)

Upvotes: 1

Problems with installing Hadoop on Ubuntu 12.04

Answers (3)

Related Questions