The Tomahawk
The Tomahawk

Reputation: 153

Issues migrating to Apache Cassandra 3.11.3 from DSE 5.0.9

We are looking at migrating from DSE 5.0.9 to Apache Cassandra 3.11.3. We've gotten quite far and managed to fix various issues (including the EverywhereStrategy one) but are running into an issue with the system.local table.

The migration/upgrade was done on just one server, so far. When we start Cassandra 3.11.3 on this one node we get an error when loading system.local:

INFO [main] 2018-12-07 10:56:12,963 ColumnFamilyStore.java:411 - Initializing system.local
INFO [SSTableBatchOpen:1] 2018-12-07 10:56:12,993 BufferPool.java:230 - Global buffer pool is enabled, when pool is exhausted (max is 512.000MiB) it will allocate on heap
ERROR [SSTableBatchOpen:1] 2018-12-07 10:56:13,013 DebuggableThreadPoolExecutor.java:239 - Error in ThreadPoolExecutor
java.lang.RuntimeException: Unknown column server_id during deserialization
at org.apache.cassandra.db.SerializationHeader$Component.toHeader(SerializationHeader.java:321) ~[apache-cassandra-3.11.3.jar:3.11.3]
at org.apache.cassandra.io.sstable.format.SSTableReader.open(SSTableReader.java:522) ~[apache-cassandra-3.11.3.jar:3.11.3]
at org.apache.cassandra.io.sstable.format.SSTableReader.open(SSTableReader.java:385) ~[apache-cassandra-3.11.3.jar:3.11.3]
at org.apache.cassandra.io.sstable.format.SSTableReader$3.run(SSTableReader.java:570) ~[apache-cassandra-3.11.3.jar:3.11.3]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_172]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_172]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_172]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_172]
at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81) [apache-cassandra-3.11.3.jar:3.11.3]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_172]

Looking at another Cassandra 3.11.3 cluster we have here, system_id doesn't exist in the table. It does, however, in the DSE 5.0.9 version of the table. Without being able to load system.local, we end up then getting the following warning:

WARN [main] 2018-12-06 10:43:57,241 SystemKeyspace.java:1087 - No host ID found, created a0bb8c11-2864-4d58-9c0c-59b97b16c48e (Note: This should happen exactly once per node).

(there is no host ID as system.local didn't load) which then causes for the following error:

ERROR [main] 2018-12-06 10:43:58,295 CassandraDaemon.java:708 - Exception encountered during startup
java.lang.RuntimeException: A node with address dubdc1-oatjeeramp2dmcassandra-04/10.109.158.254 already exists, cancelling join. Use cassandra.replace_address if you want to replace this node.
at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:558) ~[apache-cassandra-3.11.3.jar:3.11.3]
at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:804) ~[apache-cassandra-3.11.3.jar:3.11.3]
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:664) ~[apache-cassandra-3.11.3.jar:3.11.3]
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:613) ~[apache-cassandra-3.11.3.jar:3.11.3]
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:379) [apache-cassandra-3.11.3.jar:3.11.3]
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:602) [apache-cassandra-3.11.3.jar:3.11.3]
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:691) [apache-cassandra-3.11.3.jar:3.11.3]

At this point system.local has been overwritten and the new host ID value stored, and Cassandra has shutdown.

Adding -Dcassandra.replace_node=<ip address> to cassandra-env.sh results in an error saying that the node has already been bootstrapped so can't be used. I know I can get around this by deleting all of the data, but I really don't want to have to do that.

Recovering a backup of system.local will allow us to start up DSE again. Currently the node is back running DSE5.0.9

Has anyone seen this issue before, and do you have any advice on how to resolve it?

Upvotes: 1

Views: 775

Answers (1)

cdatta
cdatta

Reputation: 279

Steps:

  1. Exact available configurations copied from DSE to OSS C*.
  2. Altered few keyspace/tables:

    alter keyspace dse_system with replication = {'class': 'NetworkTopologyStrategy', 'DC3': '3'}; //DC1,DC2=OSS C*

    //if you are using spark alter table cfs_archive.sblocks with compaction = {'class':'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'};

    alter table cfs.sblocks with compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'};

  3. auto_bootstrap: false JVM_OPTS="$JVM_OPTS -Dcassandra.allow_unsafe_replace=true" JVM_OPTS="$JVM_OPTS -Dcassandra.replace_address=...

Be careful, test everything in lower env. Please go through this link for additional information: https://www.mail-archive.com/[email protected]/msg58077.html

Upvotes: 2

Related Questions