Eric Lubow
Eric Lubow

Reputation: 803

How do I bootstrap a Cassandra node when there are missing token ranges with vnodes?

I have a 12 node cluster on AWS on Cassandra 1.2.11 (DSE). I lost one of the nodes because it lost the ephemeral drive on Amazon (which contained the data). To deal with this, I removed the node with nodetool removenode $hostid which worked. The cluster still appears to be balanced, etc.

The problem is that when I tried to bootstrap a new node, I am now getting errors like this:

java.lang.IllegalStateException: unable to find sufficient sources for streaming range (-2556758013916855401,-2545694469859252228]
at org.apache.cassandra.dht.RangeStreamer.getRangeFetchMap(RangeStreamer.java:205)
at org.apache.cassandra.dht.RangeStreamer.addRanges(RangeStreamer.java:129)
at org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:81)
at org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:975)
at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:741)
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:585)
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:482)
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:348)
at com.datastax.bdp.server.DseDaemon.setup(DseDaemon.java:351)
at org.apache.cassandra.service.CassandraDaemon.init(CassandraDaemon.java:381)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.commons.daemon.support.DaemonLoader.load(DaemonLoader.java:212)
Cannot load daemon
Service exit with a return value of 3

What doesn't make sense is that other than the Opscenter which is using SimpleStrategy, the rest of the keyspaces are all using NetworkTopologyStrategy and have an RF of 3. The way I have been attempting to deal with this is figuring out which node has the range that is failing on and running a nodetool repair -pr and then trying the bootstrap again. While this might eventually work once I run a repair around the entire cluster (which could take days), I am down a node and the cluster is running in a degraded state. And if I lose another node, I am sort of screwed.

What should I be doing here and how can I get around this issue and force the node to bootstrap?

Upvotes: 3

Views: 1633

Answers (1)

Eric Lubow
Eric Lubow

Reputation: 803

I figured out the issue. The problem is that Opscenter is using SimpleStrategy and comes with a default replication_factor of 1. So when that node was lost, it couldn't bootstrap a new node. The solution was to update the Opscenter keyspace using the following command:

UPDATE KEYSPACE OpsCenter
  WITH placement_strategy = 'NetworkTopologyStrategy'
   AND strategy_options = {Cassandra : 3};

This allows the bootstrap to take place. The nodes now all need to be repaired and can potentially serve up misses to requests to the OpsCenter keyspace until the repairs are complete. But since OpsCenter is a non-required keyspace for proper cluster operation, having replaceable nodes is better here.

Upvotes: 5

Related Questions