WestCoastProjects
WestCoastProjects

Reputation: 63022

Hbase MasterNotRunningException though Hmaster, regionserver, and Zookeeper are up

I have started hbase and all of the daemons are running.

 $ jps
8482 HQuorumPeer
25105 RemoteMavenServer
9133 SecondaryNameNode
11883 HRegionServer
13793 Jps
8545 NameNode
8572 HMaster
11519 Main
25029 Main
8851 DataNode
9435 RunJar

Now let us try to list the tables:

hbase(main):004:0* list
        TABLE                                                                                                                                                   

ERROR: org.apache.hadoop.hbase.MasterNotRunningException: Retried 7 times

Here is some help for this command:
List all tables in hbase. Optional regular expression parameter could
be used to filter the output. Examples:

Tail of master log:

2013-05-17 22:48:35,609 INFO org.apache.hadoop.hbase.master.ServerManager: Registering server=localhost,60020,1368856115352

tail of Zookeeper log:

$ tail *zoo*.log
2013-05-18 00:14:27,651 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket connection from /127.0.0.1:49826
2013-05-18 00:14:27,652 INFO org.apache.zookeeper.server.ZooKeeperServer: Client attempting to establish new session at /127.0.0.1:49826
2013-05-18 00:14:27,666 INFO org.apache.zookeeper.server.ZooKeeperServer: Established session 0x13eb59ceb22001e with negotiated timeout 180000 for client /127.0.0.1:49826

Tail of regionserver log:

2013-05-18 00:08:35,416 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=2.03 MB, free=244.85 MB, max=246.88 MB, blocks=0, accesses=0, hits=0, hitRatio=0cachingAccesses=0, cachingHits=0, cachingHitsRatio=0evictions=0, evicted=0, evictedPerRun=NaN
2013-05-18 00:13:35,416 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=2.03 MB, free=244.85 MB, max=246.88 MB, blocks=0, accesses=0, hits=0, hitRatio=0cachingAccesses=0, cachingHits=0, cachingHitsRatio=0evictions=0, evicted=0, evictedPerRun=NaN
2013-05-18 00:18:35,416 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=2.03 MB, free=244.85 MB, max=246.88 MB, blocks=0, accesses=0, hits=0, hitRatio=0cachingAccesses=0, cachingHits=0, cachingHitsRatio=0evictions=0, evicted=0, evictedPerRun=NaN

More details (in response to @roman below). The safemode was already off.

fsck gives:

hadoop fsck /

.Status: HEALTHY
 Total size:    321466989 B
 Total dirs:    412
 Total files:   446
 Total blocks (validated):  355 (avg. block size 905540 B)
 Minimally replicated blocks:   355 (100.0 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:   334 (94.08451 %)
 Mis-replicated blocks:     0 (0.0 %)
 Default replication factor:    3
 Average block replication: 1.0
 Corrupt blocks:        0
 Missing replicas:      1109 (312.39438 %)
 Number of data-nodes:      1
 Number of racks:       1
FSCK ended at Sun May 19 13:09:14 PDT 2013 in 147 milliseconds

But, as you suspected the hbase gui is not running on 60030. I don't see errors in hbase log to explain why.

More info @roman: hbase hbck just times out with the MasterNotRunningException

stephenb@gondolin:/shared$ hbase hbck 
  13/05/19 13:16:16 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.3-1240972, built on 02/06/2012 10:48 GMT
  13/05/19 13:16:16 INFO zookeeper.ZooKeeper: Client environment:host.name=gondolin
  13/05/19 13:16:16 INFO zookeeper.ZooKeeper: Client environment:java.version=1.6.0_37
  13/05/19 13:16:16 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Sun Microsystems Inc.
  13/05/19 13:16:16 INFO zookeeper.ZooKeeper: Client environment:java.home=/shared/jdk1.6.0_37/jre
  13/05/19 13:16:16 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/shared/hadoop-1.0.3/libexec/../lib/native/Linux-amd64-64:/shared/hbase/lib/native/Linux-amd64-64
  13/05/19 13:16:16 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
  13/05/19 13:16:16 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
  13/05/19 13:16:16 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
  13/05/19 13:16:16 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
  13/05/19 13:16:16 INFO zookeeper.ZooKeeper: Client environment:os.version=3.2.0-39-generic
  13/05/19 13:16:16 INFO zookeeper.ZooKeeper: Client environment:user.name=stephenb
  13/05/19 13:16:16 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/stephenb
  13/05/19 13:16:16 INFO zookeeper.ZooKeeper: Client environment:user.dir=/shared
  13/05/19 13:16:16 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=180000 watcher=hconnection
  13/05/19 13:16:16 INFO zookeeper.ClientCnxn: Opening socket connection to server /127.0.0.1:2181
  13/05/19 13:16:16 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is 24642@gondolin
  13/05/19 13:16:16 WARN client.ZooKeeperSaslClient: SecurityException: java.lang.SecurityException: Unable to locate a login configuration occurred when trying to find JAAS configuration.
  13/05/19 13:16:16 INFO client.ZooKeeperSaslClient: Client will not SASL-authenticate because the default JAAS configuration section 'Client' could not be found. If you are not using SASL, you may ignore this. On the other hand, if you expected SASL to work, please fix your JAAS configuration.
  13/05/19 13:16:16 INFO zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session
  13/05/19 13:16:16 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x13eb59ceb22002f, negotiated timeout = 180000
  13/05/19 13:17:27 INFO client.HConnectionManager$HConnectionImplementation: Closed zookeeper sessionid=0x13eb59ceb22002f
  13/05/19 13:17:27 INFO zookeeper.ZooKeeper: Session: 0x13eb59ceb22002f closed
  13/05/19 13:17:27 INFO zookeeper.ClientCnxn: EventThread shut down
  13/05/19 13:17:27 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=180000 watcher=hconnection
  13/05/19 13:17:27 INFO zookeeper.ClientCnxn: Opening socket connection to server /127.0.0.1:2181
  13/05/19 13:17:27 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is 24642@gondolin
  13/05/19 13:17:27 WARN client.ZooKeeperSaslClient: SecurityException: java.lang.SecurityException: Unable to locate a login configuration occurred when trying to find JAAS configuration.
  13/05/19 13:17:27 INFO client.ZooKeeperSaslClient: Client will not SASL-authenticate because the default JAAS configuration section 'Client' could not be found. If you are not using SASL, you may ignore this. On the other hand, if you expected SASL to work, please fix your JAAS configuration.
  13/05/19 13:17:27 INFO zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session
  13/05/19 13:17:27 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x13eb59ceb220030, negotiated timeout = 180000
  13/05/19 13:18:39 INFO client.HConnectionManager$HConnectionImplementation: Closed zookeeper sessionid=0x13eb59ceb220030
  13/05/19 13:18:39 INFO zookeeper.ZooKeeper: Session: 0x13eb59ceb220030 closed
  13/05/19 13:18:39 INFO zookeeper.ClientCnxn: EventThread shut down
  13/05/19 13:18:39 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=180000 watcher=hconnection
  13/05/19 13:18:39 INFO zookeeper.ClientCnxn: Opening socket connection to server /127.0.0.1:2181
  13/05/19 13:18:39 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is 24642@gondolin
  13/05/19 13:18:39 WARN client.ZooKeeperSaslClient: SecurityException: java.lang.SecurityException: Unable to locate a login configuration occurred when trying to find JAAS configuration.
  13/05/19 13:18:39 INFO client.ZooKeeperSaslClient: Client will not SASL-authenticate because the default JAAS configuration section 'Client' could not be found. If you are not using SASL, you may ignore this. On the other hand, if you expected SASL to work, please fix your JAAS configuration.
  13/05/19 13:18:39 INFO zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session
  13/05/19 13:18:39 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x13eb59ceb220031, negotiated timeout = 180000
  13/05/19 13:18:51 DEBUG client.HConnectionManager$HConnectionImplementation: The connection to null was closed by the finalize method.
  13/05/19 13:18:51 DEBUG client.HConnectionManager$HConnectionImplementation: 
  13/05/19 13:29:18 INFO client.HConnectionManager$HConnectionImplementation: Closed zookeeper sessionid=0x13eb59ceb220039
    13/05/19 13:29:18 INFO zookeeper.ZooKeeper: Session: 0x13eb59ceb220039 closed
    13/05/19 13:29:18 INFO zookeeper.ClientCnxn: EventThread shut down
    Exception in thread "main" org.apache.hadoop.hbase.MasterNotRunningException: Retried 10 times
        at org.apache.hadoop.hbase.client.HBaseAdmin.<init>(HBaseAdmin.java:130)
        at org.apache.hadoop.hbase.util.HBaseFsck.connect(HBaseFsck.java:264)
        at org.apache.hadoop.hbase.util.HBaseFsck.exec(HBaseFsck.java:3331)
        at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3192)

Upvotes: 1

Views: 1634

Answers (1)

Roman Nikitchenko
Roman Nikitchenko

Reputation: 13046

And HBase web UI is not running, yeah? I had something similar after total crash of single node pseudo-distributed cluster. HDFS was not able to exit safe mode.

  1. Check HDFS is not in safe mode with hadoop dfsadmin -safemode get.
  2. If so, manually force safe mode to exit hadoop dfsadmin -safemode leave.
  3. You should see progress - at least HBase web UI should be visible.
  4. Perform HDFS fsck: hadoop fsck / -move.
  5. OK, if everything goes right it's better to perform hbase hbck check.

Other tips you may need:

  • Check where region server is bound with netstat -n -a (check port in your configuration). It happens that it is bound on wrong interface. Also please search forums - there was issue with Hadoop binding and IPv6 (check this for example).
  • Check if hadoop really exited safe mode with hadoop dfsadmin -safemode get. HBase does not start fully until it is done.

Upvotes: 1

Related Questions