Reputation: 8500
I'm building an automated install script for Hadoop, and I'm encountering an issue where HBase won't start, because HDFS isn't yet fully booted and ready. How can I programmatically (from Bash, ideally) tell whether the HDFS system is ready for HBase to boot, so I can wait until it is?
I tried using "hadoop dfsadmin -report" and grepping for the correct number of nodes, but apparently that will still return before the cluster is actually ready for business.
Upvotes: 2
Views: 836
Reputation: 11
It seems like this is a somewhat older post, but I'd like to add to it a little bit.
When starting HDFS, if you use hdfs dfsadmin -safemode wait
, it may throw the following exception:
safemode: Call From shworker1/xxx.xxx.xxx.xxx to main:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
Obviously, it is caused by HDFS starting incompletion.
Therefore, I've added a simple loop to ensure that HDFS starts up correctly and exits safe mode before proceeding with further operations:
while true; do
# Block until HDFS exits safe mode or cannot connect to the NameNode
hdfs dfsadmin -safemode wait
if [ $? -eq 0 ]; then
break
fi
# Note that HDFS may not have started yet and may throw ConnectException: Connection refused, so we need to continue looping
sleep 0.5
echo "HDFS not started yet...Retrying..."
done
Hope this will help you!
Upvotes: 1
Reputation: 35405
Use hadoop dfsadmin -safemode wait
to check if HDFS is out of safe mode yet. Something like this should do the trick:
while $HADOOP_HOME/bin/hadoop dfsadmin -safemode wait | grep ON
do
sleep 1s # Or 10s or 1m or whatever time
done
EDIT: As levand mentions in the comment, as per HADOOP-756, -safemode wait
will itself wait until safemode is off. In that case, you can simply issue wait and the while loop would be unnecessary. But still, if you want to keep trying only for a certain amount of time and kill the process if DFS is still not up or something, then the while
loop might still be useful. I have seen that kind of thing happen when we make mistakes in setting up the cluster.
Upvotes: 6