user228137
user228137

Reputation: 762

nodetool status: "error: No nodes present in the cluster. Has this node finished starting up?"

I'm trying to setup a 2-node cassandra-2.1 cluster with the following node configurations:

Cluster Name: 'Cluster1'
num_tokens: 256
listen_address: 10.20.0.52/10.20.0.53
rpc_address: 10.20.0.52/10.20.0.53
class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
    # seeds is actually a comma-delimited list of addresses.
    # Ex: "<ip1>,<ip2>,<ip3>"
    - seeds: "10.20.0.52"

I first start the seed node (52) then, I check nodetool status and returns data only for 52. But then I boot (53) and nodetool status throws the following exception after a few seconds:

-- StackTrace --
java.lang.RuntimeException: No nodes present in the cluster. Has this node finished starting up?
        at org.apache.cassandra.dht.Murmur3Partitioner.describeOwnership(Murmur3Partitioner.java:131)
        at org.apache.cassandra.service.StorageService.getOwnership(StorageService.java:3912)
        at sun.reflect.GeneratedMethodAccessor26.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71)
        at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275)
        at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
        at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
        at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
        at com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:83)
        at com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:206)
        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:647)
        at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:678)
        at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1443)
        at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76)
        at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1307)
        at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1399)
        at javax.management.remote.rmi.RMIConnectionImpl.getAttribute(RMIConnectionImpl.java:637)
        at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:323)
        at sun.rmi.transport.Transport$1.run(Transport.java:200)
        at sun.rmi.transport.Transport$1.run(Transport.java:197)
        at java.security.AccessController.doPrivileged(Native Method)
        at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
        at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568)
        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826)
        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$250(TCPTransport.java:683)
        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler$$Lambda$1/1165999373.run(Unknown Source)
        at java.security.AccessController.doPrivileged(Native Method)
        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682)

but on the non-seed node (53) it returns the standard output with details only for itself (53). nodetool gossipinfo on the seed node (52) returns information about both nodes:

/10.20.0.52
  generation:1439824481
  heartbeat:2433
  SCHEMA:500091e4-e8ab-303d-9111-8cca7edff2d0
  HOST_ID:2d78ed48-13e8-4fc5-ac55-8b2a6d00c8c5
  NET_VERSION:8
  RELEASE_VERSION:2.1.8-SNAPSHOT
  STATUS:NORMAL,-1091407767707699731
  RPC_ADDRESS:10.20.0.52
  SEVERITY:0.5025125741958618
  DC:DC1
  LOAD:2524926.0
  RACK:RAC1
  INTERNAL_IP:10.20.0.52
/10.20.0.53
  generation:1439824502
  heartbeat:2376
  SCHEMA:500091e4-e8ab-303d-9111-8cca7edff2d0
  NET_VERSION:8
  HOST_ID:2d78ed48-13e8-4fc5-ac55-8b2a6d00c8c5
  RELEASE_VERSION:2.1.8-SNAPSHOT
  STATUS:NORMAL,-1091407767707699731
  RPC_ADDRESS:10.20.0.53
  SEVERITY:0.0
  DC:DC1
  LOAD:2603302.0
  RACK:RAC1
  INTERNAL_IP:10.20.0.53

but on the non-seed node it only displays information about itself and does not include the seed node (52).

Another discrepancy between the state/information about the 2 nodes is the output of nodetool netstats which for the seed node (52) shows:

ubuntu@52:~$ nodetool netstats 
Mode: NORMAL
Not sending any streams.
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool Name                    Active   Pending      Completed
Commands                        n/a         0              0
Responses                       n/a         0           1135

while for the non-seed node (53) the number of requests completed is double of that of the seed node:

ubuntu@53:~$ nodetool netstats 
Mode: NORMAL
Not sending any streams.
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool Name                    Active   Pending      Completed
Commands                        n/a         0              0
Responses                       n/a         0           2388

Source code Given the stacktrace, I tried to insert some flags and print what appears to be causing the error on L206 Murmur3Partitioner.java when describeOwnership method is called: - the method is called when the seed node is booted - the method is called when the non-seed node is bootstrapped

both times the list of tokes (or sortedTokens) is exactly the same, yet the iterator is empty and triggers the error in the title.

Note: the relvant ports (7000,7001) on both nodes (52,53) are open.

Update #1: so, I found out (thanks to irc #cassandra channel) that if two nodes have the same tokes a conflict is created and one will fail to bootstrap.

To address this I tried the following: cqlsh> DROP KEYSPACE ycsb ;

which didn't fix the issue - nodetool ring still showed the same tokens corresponding to the non-seed node; I also flushed the changes after closing cqlsh. Then:

sudo rm -rf /var/lib/cassandra/data/*
sudo rm -rf /var/lib/cassandra/commitlog/*
sudo rm -rf /var/lib/cassandra/saved_caches/*

which still didn't reduce or change the tokes that show up in nodetool ring.

Any guidance is appreciated.

Upvotes: 1

Views: 3189

Answers (1)

user228137
user228137

Reputation: 762

The culprit appears to have been the ports and firewall rules which wouldn't allow for nodes to establish bidirectional symmetric connections in order to exchange tokens residing on each node. The troubleshooting steps taken were:
1) nestat -l on both nodes to see which ports are open/listening;
2) nmap from one node to another to scan open ports.
3) nodetool ring to compare the tokens on both nodes
4) TRACE logging level set in logback.xml and output either in a separate log file or to stderr

I also recommend discussing your issues with #cassandra IRC-channel. The folks there are very knowledgeable and can help in almost real-time.

Hope it helps!

Upvotes: 1

Related Questions