Reputation: 762
I'm trying to setup a 2-node cassandra-2.1 cluster with the following node configurations:
Cluster Name: 'Cluster1'
num_tokens: 256
listen_address: 10.20.0.52/10.20.0.53
rpc_address: 10.20.0.52/10.20.0.53
class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
# seeds is actually a comma-delimited list of addresses.
# Ex: "<ip1>,<ip2>,<ip3>"
- seeds: "10.20.0.52"
I first start the seed node (52) then, I check nodetool status
and returns data only for 52. But then I boot (53) and nodetool status
throws the following exception after a few seconds:
-- StackTrace --
java.lang.RuntimeException: No nodes present in the cluster. Has this node finished starting up?
at org.apache.cassandra.dht.Murmur3Partitioner.describeOwnership(Murmur3Partitioner.java:131)
at org.apache.cassandra.service.StorageService.getOwnership(StorageService.java:3912)
at sun.reflect.GeneratedMethodAccessor26.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71)
at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275)
at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
at com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:83)
at com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:206)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:647)
at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:678)
at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1443)
at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76)
at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1307)
at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1399)
at javax.management.remote.rmi.RMIConnectionImpl.getAttribute(RMIConnectionImpl.java:637)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:323)
at sun.rmi.transport.Transport$1.run(Transport.java:200)
at sun.rmi.transport.Transport$1.run(Transport.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$250(TCPTransport.java:683)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler$$Lambda$1/1165999373.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682)
but on the non-seed node (53) it returns the standard output with details only for itself (53).
nodetool gossipinfo
on the seed node (52) returns information about both nodes:
/10.20.0.52
generation:1439824481
heartbeat:2433
SCHEMA:500091e4-e8ab-303d-9111-8cca7edff2d0
HOST_ID:2d78ed48-13e8-4fc5-ac55-8b2a6d00c8c5
NET_VERSION:8
RELEASE_VERSION:2.1.8-SNAPSHOT
STATUS:NORMAL,-1091407767707699731
RPC_ADDRESS:10.20.0.52
SEVERITY:0.5025125741958618
DC:DC1
LOAD:2524926.0
RACK:RAC1
INTERNAL_IP:10.20.0.52
/10.20.0.53
generation:1439824502
heartbeat:2376
SCHEMA:500091e4-e8ab-303d-9111-8cca7edff2d0
NET_VERSION:8
HOST_ID:2d78ed48-13e8-4fc5-ac55-8b2a6d00c8c5
RELEASE_VERSION:2.1.8-SNAPSHOT
STATUS:NORMAL,-1091407767707699731
RPC_ADDRESS:10.20.0.53
SEVERITY:0.0
DC:DC1
LOAD:2603302.0
RACK:RAC1
INTERNAL_IP:10.20.0.53
but on the non-seed node it only displays information about itself and does not include the seed node (52).
Another discrepancy between the state/information about the 2 nodes is the output of nodetool netstats
which for the seed node (52) shows:
ubuntu@52:~$ nodetool netstats
Mode: NORMAL
Not sending any streams.
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool Name Active Pending Completed
Commands n/a 0 0
Responses n/a 0 1135
while for the non-seed node (53) the number of requests completed is double of that of the seed node:
ubuntu@53:~$ nodetool netstats
Mode: NORMAL
Not sending any streams.
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool Name Active Pending Completed
Commands n/a 0 0
Responses n/a 0 2388
Source code
Given the stacktrace, I tried to insert some flags and print what appears to be causing the error on L206 Murmur3Partitioner.java
when describeOwnership
method is called:
- the method is called when the seed node is booted
- the method is called when the non-seed node is bootstrapped
both times the list of tokes (or sortedTokens
) is exactly the same, yet the iterator is empty and triggers the error in the title.
Note: the relvant ports (7000,7001) on both nodes (52,53) are open.
Update #1: so, I found out (thanks to irc #cassandra channel) that if two nodes have the same tokes a conflict is created and one will fail to bootstrap.
To address this I tried the following: cqlsh> DROP KEYSPACE ycsb ;
which didn't fix the issue - nodetool ring
still showed the same tokens corresponding to the non-seed node; I also flushed the changes after closing cqlsh
. Then:
sudo rm -rf /var/lib/cassandra/data/*
sudo rm -rf /var/lib/cassandra/commitlog/*
sudo rm -rf /var/lib/cassandra/saved_caches/*
which still didn't reduce or change the tokes that show up in nodetool ring
.
Any guidance is appreciated.
Upvotes: 1
Views: 3189
Reputation: 762
The culprit appears to have been the ports and firewall rules which wouldn't allow for nodes to establish bidirectional symmetric connections in order to exchange tokens residing on each node. The troubleshooting steps taken were:
1) nestat -l
on both nodes to see which ports are open/listening;
2) nmap
from one node to another to scan open ports.
3) nodetool ring
to compare the tokens on both nodes
4) TRACE
logging level set in logback.xml
and output either in a separate log file or to stderr
I also recommend discussing your issues with #cassandra IRC-channel. The folks there are very knowledgeable and can help in almost real-time.
Hope it helps!
Upvotes: 1