Problems with installing Hadoop on Ubuntu 12.04

I just set up a new Ubuntu 12.04 VM (Virtualbox) and wanted to test Hadoop on it. I am following this guide: http://hadoop.apache.org/docs/r0.20.2/quickstart.html

I think I am doing something wrong with the java installation and the JAVA_HOME path... Right now bin/hadoop always just returns "command not found"

Where do I have to extract the hadoop folder?

Do I need to set up SSH before? What about SSHD?

What are the commands to install the correct java version?

What EXACTLY do I have to enter into the hadoop-env.sh file?

Thanks!

Upvotes: 0

Views: 2140

Answers (3)

Prateek Mishra
Prateek Mishra

Reputation: 1264

Installing Hadoop Hive Scoop and PIG

Follow the steps to install the above applications. Note : There is no need of extra user, you may work on existing system.

  1. Download Haddop 2.7.1, PIG, Sqoop, Hive

     http://www.us.apache.org/dist/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz   
    
     http://www.us.apache.org/dist/pig/pig-0.13.0/pig-0.13.0.tar.gz      
    
     http://www.us.apache.org/dist/sqoop/1.4.6/ sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz   
    
     http://www.eu.apache.org/dist/hive/hive-1.2.1/apache-hive-1.2.1-bin.tar.gz    
    
  2. Extract in a folder say /home/mypc/hadoop-soft --> cd hadoop-soft

    hive -->  /home/mypc/hadoop-soft/hive
    sqoop --> /home/mypc/hadoop-soft/sqoop
    pig   --> /home/mypc/hadoop-soft/pig
    hadoop --> /home/mypc/hadoop-soft/hadoop
    

Make Sure you do not create any subfolder in these folder and are able to see respective bin folder.

  1. Now Lets move these folders to /usr/lib

     sudo mkdir /usr/lib/hadoop
    
     sudo mv sqoop/ /usr/lib/hadoop/
     sudo mv pig/ /usr/lib/hadoop/
     sudo mv hive/ /usr/lib/hadoop/
     sudo mv hadoop-2.6/ /usr/lib/hadoop/
    
  2. Edit .bashrc File to add Path : Add the Following line at the end of file

    Remove Java_path Statment ,if any as we are updating it here.

    Check if Java is installed and is present at the location mentioned below. If yes then fine, if not then you need to google install java n ubuntu

     sudo gedit ~/.bashrc
    

Add following lines in the end to .bashrc

     export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
     export HADOOP_HOME=/usr/lib/hadoop/hadoop-2.6
     export HIVE_HOME=/usr/lib/hadoop/hive
     export PIG_HOME=/usr/lib/hadoop/pig
     export SQOOP_HOME=/usr/lib/hadoop/sqoop

    export HADOOP_MAPRED_HOME=/usr/lib/hadoop/hadoop
    export HADOOP_COMMON_HOME=/usr/lib/hadoop/hadoop
    export HADOOP_HDFS_HOME=/usr/lib/hadoop/hadoop
    export HADOOP_YARN_HOME=/usr/lib/hadoop/hadoop  
    export HADOOP_CONF_DIR=/usr/lib/hadoop/hadoop/etc/hadoop

    export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin::$PIG_HOME/bin:$HIVE_HOME/bin:$SQOOP_HOME/bin
  1. Save this file and close it. Now you may wanna run it so that updates are reflected.

    source ~/.bashrc 
    

6.Create two dirctories namenode and datanode

cd /usr/lib
sudo mkdir hdfs
cd hdfs
sudo mkdir namenode
sudo mkdir datanode
sudo chmod 777 -R namenode
sudo chmod 777 -R datanode 
  1. Go to $HADOOP_HOME and edit some files.

    cd $HADOOP_HOME
    cd etc/hadoop/
    

    A. sudo gedit yarn-site.xml : Add these lines inside < configuration> < /configuration>

    <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
    </property>
    <property>
    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    
    • Save File and Close

    B. sudo gedit core-site.xml : Add these lines inside < configuration> < /configuration>

    <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:9000</value>
    </property>
    
    • Save File and Close.

C. sudo gedit hdfs-site.xml : Add these lines inside <~configuration> <~/configuration>

    <property>
    <name>dfs.replication</name>
    <value>1</value>
    </property>
    <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/usr/lib/hdfs/namenode</value>
    </property>
    <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/usr/lib/hdfs/datanode</value>
    </property>
  • Save File and Close

D. sudo gedit mapred-site.xml :Add these lines

     <?xml version="1.0"?>
      <configuration>
        <property>
            <name>mapreduce.framework.name</name>
            <value>yarn</value>
     </property> 
    </configuration>

Note : This will be a new file. - Save it and Close.

  1. Format namenode hdfs namenode -format

  2. Go to /usr/lib/hdfs and create start and stop scripts

    cd /usr/lib/hdfs
    sudo mkdir scripts
    sudo chmod 777 -R scripts
    cd scripts
    sudo gedit hadoopstart.sh
    

Write these lines

    /usr/lib/hadoop/hadoop-2.6/sbin/hadoop-daemon.sh start namenode
    /usr/lib/hadoop/hadoop-2.6/sbin/hadoop-daemon.sh start datanode
    /usr/lib/hadoop/hadoop-2.6/sbin/yarn-daemon.sh start resourcemanager
    /usr/lib/hadoop/hadoop-2.6/sbin/yarn-daemon.sh start nodemanager
    /usr/lib/hadoop/hadoop-2.6/sbin/mr-jobhistory-daemon.sh start historyserver
  • Save it and close.

    sudo gedit hadoopstop.sh
    

Write these lines

    /usr/lib/hadoop/hadoop-2.6/sbin/hadoop-daemon.sh stop namenode
    /usr/lib/hadoop/hadoop-2.6/sbin/hadoop-daemon.sh stop datanode
    /usr/lib/hadoop/hadoop-2.6/sbin/yarn-daemon.sh stop resourcemanager
    /usr/lib/hadoop/hadoop-2.6/sbin/yarn-daemon.sh stop nodemanager
    /usr/lib/hadoop/hadoop-2.6/sbin/mr-jobhistory-daemon.sh stop historyserver

-Save it and close it.

  1. Run these files to start and stop hadoop in local mode

To start

     sh /usr/lib/hdfs/scripts/hadoopstart.sh 

To stop

     sh /usr/lib/hdfs/scripts/hadoopstop.sh 
  1. Check if hadoop is running : After running start script

    hadoop version
    hadoopp fs -ls /
    
    Open http://localhost:50070 to see if name node is running.
    
  2. Run Various Serives using : On Terminal

    Pig
    sqoop
    hive
    

Upvotes: 0

Ohadi
Ohadi

Reputation: 121

I used this great tutorial. Only change was that I installed a default Java6...

Michael Noll Tutorial for setting up Hadoop

Upvotes: 2

iTech
iTech

Reputation: 18460

  • The "command not found" error when running hadoop should not be related to JAVA_HOME. I believe you are not running this command from hadoop home directory (other alternative is to add the full path to hadoop/bin to your PATH).

  • You can extract hadoop folder anywhere you like

  • For hadoop-env.sh, you should set the JAVA_HOME variable to point to your Java installation home directory e.g. export JAVA_HOME=/home/jdk1.6.0/ change this path to reflect your environement

  • You will need SSH and SSHD especially if you will run Hadoop in distributed or pseudo-distributed environment.

  • Hadoop require Java 1.6+, just download jdk-7u9-linux-i586.tar.gz from here and follow the installation guide (it should not require more than just unzipping it)

Upvotes: 1

Related Questions