Reputation: 23633
I have a small cluster that is virtually empty. Usually nodetool removenode
completes in on the order of 10s of seconds. However, I currently have a node removal in process that is taking 10s of minutes and isn't seeming to make any progress. An additional request to remove the node is rejected because there is already a removal in progress. How can I troubleshoot this? For reference, here is the output to nodetool status
:
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
DL 192.168.12.207 152.14 KB 256 32.2% 683d8351-c625-4d7f-99cc-61f6b73b0c56 rack1
UN 192.168.12.205 215.21 KB 256 37.2% b66d5fff-ef1d-4fbf-a49a-43709df99a0c rack1
UN 192.168.12.208 148.09 KB 256 30.6% 39b54771-59b8-49f7-8db8-9cf4523d6c8d rack1
Also, cassandra is not running on host 207 (the leaving host), but is running on the other two hosts.
EDIT: It seems there is at least one token that is stuck awaiting replication:
$ nodetool removenode status
RemovalStatus: Removing token (-9037887679483580088). Waiting for replication confirmation from [/192.168.12.205].
Upvotes: 14
Views: 12004
Reputation: 361
Don't know which version of Cassandra is the one with the problem. But, if nodetool removenode is not working, according to the Apache Cassandra Wiki, you should try the following:
Removenode
Removing a node that does not physically exist anymore is done in two steps:
bin/nodetool removenode <UUID> bin/nodetool removenode force
The first command will block forever if the computer attached to that UUID was physically removed (or does not run Cassandra anymore). Just click Ctrl-C after a second or two before running the second command. Obviously, it is better to first decommission a node if possible or you may lose some of your data.
The "bin/nodetool status" command shows the UUID of your nodes.
Hope it helps .
Upvotes: 17