levand
levand

Reputation: 8500

How can I make HBase wait to start until HDFS is ready?

I'm building an automated install script for Hadoop, and I'm encountering an issue where HBase won't start, because HDFS isn't yet fully booted and ready. How can I programmatically (from Bash, ideally) tell whether the HDFS system is ready for HBase to boot, so I can wait until it is?

I tried using "hadoop dfsadmin -report" and grepping for the correct number of nodes, but apparently that will still return before the cluster is actually ready for business.

Upvotes: 2

Views: 836

Answers (2)

SomeBottle
SomeBottle

Reputation: 11

It seems like this is a somewhat older post, but I'd like to add to it a little bit.

When starting HDFS, if you use hdfs dfsadmin -safemode wait, it may throw the following exception:

safemode: Call From shworker1/xxx.xxx.xxx.xxx to main:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

Obviously, it is caused by HDFS starting incompletion.

Therefore, I've added a simple loop to ensure that HDFS starts up correctly and exits safe mode before proceeding with further operations:

while true; do
    # Block until HDFS exits safe mode or cannot connect to the NameNode
    hdfs dfsadmin -safemode wait
    if [ $? -eq 0 ]; then
        break
    fi
    # Note that HDFS may not have started yet and may throw ConnectException: Connection refused, so we need to continue looping
    sleep 0.5
    echo "HDFS not started yet...Retrying..."
done

Hope this will help you!

Upvotes: 1

Hari Menon
Hari Menon

Reputation: 35405

Use hadoop dfsadmin -safemode wait to check if HDFS is out of safe mode yet. Something like this should do the trick:

while $HADOOP_HOME/bin/hadoop dfsadmin -safemode wait | grep ON
do
    sleep 1s # Or 10s or 1m or whatever time
done

EDIT: As levand mentions in the comment, as per HADOOP-756, -safemode wait will itself wait until safemode is off. In that case, you can simply issue wait and the while loop would be unnecessary. But still, if you want to keep trying only for a certain amount of time and kill the process if DFS is still not up or something, then the while loop might still be useful. I have seen that kind of thing happen when we make mistakes in setting up the cluster.

Upvotes: 6

Related Questions