evyatars
evyatars

Reputation: 81

Upgraded Cassandra 3.11 to 4.0, failed with "node with address ... already exists"

we try to upgrade apache cassandra 3.11.12 to 4.0.2, this is the first node we upgrade in this cluster (seed node). we drain the node and stop the service before replace the version.

system log:

NFO  [RMI TCP Connection(16)-IP] 2022-03-03 15:50:18,811 StorageService.java:1568 - DRAINED
....
....
INFO  [main] 2022-03-03 15:58:02,970 QueryProcessor.java:167 - Preloaded 0 prepared statements
INFO  [main] 2022-03-03 15:58:02,970 StorageService.java:735 - Cassandra version: 4.0.2
INFO  [main] 2022-03-03 15:58:02,971 StorageService.java:736 - CQL version: 3.4.5
INFO  [main] 2022-03-03 15:58:02,971 StorageService.java:737 - Native protocol supported versions: 3/v3, 4/v4, 5/v5, 6/v6-beta (default: 5/v5)
...
...
WARN  [main] 2022-03-03 15:58:03,328 SystemKeyspace.java:1130 - No host ID found, created d78ab047-f1f9-4a07-8118-2fa83f4571ef (Note: This should happen exactly once per node).
....
...
ERROR [main] 2022-03-03 15:58:04,543 CassandraDaemon.java:911 - Exception encountered during startup
java.lang.RuntimeException: A node with address /HOST_IP:7001 already exists, cancelling join. Use cassandra.replace_address if you want to replace this node.
        at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:660)
        at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:935)
        at org.apache.cassandra.service.StorageService.initServer(StorageService.java:785)
        at org.apache.cassandra.service.StorageService.initServer(StorageService.java:730)
        at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:420)
        at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:765)
        at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:889)
INFO  [StorageServiceShutdownHook] 2022-03-03 15:58:04,558 HintsService.java:222 - Paused hints dispatch
WARN  [StorageServiceShutdownHook] 2022-03-03 15:58:04,561 Gossiper.java:2032 - No local state, state is in silent shutdown, or node hasn't joined, not announcing shutdown
INFO  [StorageServiceShutdownHook] 2022-03-03 15:58:04,561 MessagingService.java:441 - Waiting for messaging service to quiesce
...
..
INFO  [StorageServiceShutdownHook] 2022-03-03 15:58:06,956 HintsService.java:222 - Paused hints dispatch

did we need to delete\rm -rf system* data directories after drain the node before we start the new cassandra version (We didn't do that)? how we can solve this problem?

Upvotes: 3

Views: 484

Answers (1)

Erick Ramirez
Erick Ramirez

Reputation: 16393

During startup, Cassandra tries to retrieve the host ID by querying the local system table with:

SELECT host_id FROM system.local WHERE key = 'local'

But if the system.local table is empty or the SSTables are missing from system/local-*/ data subdirectories, Cassandra assumes that it is a brand new node and assigns a new host ID. However in your case, Cassandra realises that another node with the same IP address is already part of the cluster when it gossips with other nodes.

You need to figure out why Cassandra can't access the local system.local table. If someone deleted system/local-*/ from the data directory, then you won't be able to start the node again. If this was the case, you'll need to start from scratch which involves:

  • wipe all the contents of data/, commitlog/ and saved_caches/
  • uninstall C* 4.0
  • reinstall C* 3.11

You will then need to replace the node "with itself" using the replace_address method. Cheers!

Upvotes: 5

Related Questions