Cassandra assertion error after starting the second node

Question

I have a Cassandra cluster with two nodes with simple replication strategy.

Everything worked well until one of nodes crashed. I recovered the crashed node by cloning the remaining node virtual machine (so we cloned a file system), and updated the listening and RPC address.

Now I keep getting the following strange error.

When I am running each single node, everything is working well. But when I am starting the second node, the first one falls back with an error!

ERROR [Native-Transport-Requests-1] 2020-07-21 08:19:31,042 Message.java:693 - Unexpected exception during request; channel = [id: 0xc1935e7a, L:/192.168.40.15:9042 - R:/192.168.40.15:47980]
java.lang.AssertionError: null
    at org.apache.cassandra.locator.TokenMetadata.firstTokenIndex(TokenMetadata.java:1065) ~[apache-cassandra-3.11.4.jar:3.11.4]
    at org.apache.cassandra.locator.TokenMetadata.firstToken(TokenMetadata.java:1079) ~[apache-cassandra-3.11.4.jar:3.11.4]
    at org.apache.cassandra.locator.AbstractReplicationStrategy.getNaturalEndpoints(AbstractReplicationStrategy.java:107) ~[apache-cassandra-3.11.4.jar:3.11.4]
    at org.apache.cassandra.service.StorageService.getLiveNaturalEndpoints(StorageService.java:3866) ~[apache-cassandra-3.11.4.jar:3.11.4]
    at org.apache.cassandra.service.StorageService.getLiveNaturalEndpoints(StorageService.java:3852) ~[apache-cassandra-3.11.4.jar:3.11.4]
    at org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints(StorageProxy.java:1914) ~[apache-cassandra-3.11.4.jar:3.11.4]
    at org.apache.cassandra.service.StorageProxy$RangeIterator.computeNext(StorageProxy.java:1992) ~[apache-cassandra-3.11.4.jar:3.11.4]
    at org.apache.cassandra.service.StorageProxy$RangeIterator.computeNext(StorageProxy.java:1962) ~[apache-cassandra-3.11.4.jar:3.11.4]
    at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) ~[apache-cassandra-3.11.4.jar:3.11.4]
    at com.google.common.collect.Iterators$PeekingImpl.hasNext(Iterators.java:1149) ~[guava-18.0.jar:na]
    at org.apache.cassandra.service.StorageProxy$RangeMerger.computeNext(StorageProxy.java:2014) ~[apache-cassandra-3.11.4.jar:3.11.4]
    at org.apache.cassandra.service.StorageProxy$RangeMerger.computeNext(StorageProxy.java:1999) ~[apache-cassandra-3.11.4.jar:3.11.4]
    at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) ~[apache-cassandra-3.11.4.jar:3.11.4]
    at org.apache.cassandra.service.StorageProxy$RangeCommandIterator.computeNext(StorageProxy.java:2132) ~[apache-cassandra-3.11.4.jar:3.11.4]
    at org.apache.cassandra.service.StorageProxy$RangeCommandIterator.computeNext(StorageProxy.java:2092) ~[apache-cassandra-3.11.4.jar:3.11.4]
    at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) ~[apache-cassandra-3.11.4.jar:3.11.4]
    at org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:92) ~[apache-cassandra-3.11.4.jar:3.11.4]
    at org.apache.cassandra.cql3.statements.SelectStatement.process(SelectStatement.java:786) ~[apache-cassandra-3.11.4.jar:3.11.4]
    at org.apache.cassandra.cql3.statements.SelectStatement.processResults(SelectStatement.java:438) ~[apache-cassandra-3.11.4.jar:3.11.4]
    at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:416) ~[apache-cassandra-3.11.4.jar:3.11.4]
    at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:289) ~[apache-cassandra-3.11.4.jar:3.11.4]
    at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:117) ~[apache-cassandra-3.11.4.jar:3.11.4]
    at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:225) ~[apache-cassandra-3.11.4.jar:3.11.4]
    at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:256) ~[apache-cassandra-3.11.4.jar:3.11.4]
    at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:241) ~[apache-cassandra-3.11.4.jar:3.11.4]
    at org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:116) ~[apache-cassandra-3.11.4.jar:3.11.4]
    at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:566) [apache-cassandra-3.11.4.jar:3.11.4]
    at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:410) [apache-cassandra-3.11.4.jar:3.11.4]
    at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) [netty-all-4.0.44.Final.jar:4.0.44.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357) [netty-all-4.0.44.Final.jar:4.0.44.Final]
    at io.netty.channel.AbstractChannelHandlerContext.access$600(AbstractChannelHandlerContext.java:35) [netty-all-4.0.44.Final.jar:4.0.44.Final]
    at io.netty.channel.AbstractChannelHandlerContext$7.run(AbstractChannelHandlerContext.java:348) [netty-all-4.0.44.Final.jar:4.0.44.Final]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_252]
    at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162) [apache-cassandra-3.11.4.jar:3.11.4]
    at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:114) [apache-cassandra-3.11.4.jar:3.11.4]
    at java.lang.Thread.run(Thread.java:748) [na:1.8.0_252]

I'm using the following Cassandra version: [cqlsh 5.0.1 | Cassandra 3.11.4 | CQL spec 3.4.4 | Native protocol v4]

Here are the configuration files:

cassandra.yaml

cluster_name: 'babelfish'

num_tokens: 256



hinted_handoff_enabled: true



hinted_handoff_throttle_in_kb: 1024

max_hints_delivery_threads: 2


hints_flush_period_in_ms: 10000

max_hints_file_size_in_mb: 128


batchlog_replay_throttle_in_kb: 1024

authenticator: AllowAllAuthenticator

authorizer: AllowAllAuthorizer

role_manager: CassandraRoleManager

roles_validity_in_ms: 2000


permissions_validity_in_ms: 2000


credentials_validity_in_ms: 2000


partitioner: org.apache.cassandra.dht.Murmur3Partitioner

data_file_directories:
    - /var/lib/cassandra/data

commitlog_directory: /var/lib/cassandra/commitlog

cdc_enabled: false


disk_failure_policy: stop

commit_failure_policy: stop

prepared_statements_cache_size_mb:

thrift_prepared_statements_cache_size_mb:

key_cache_size_in_mb:

key_cache_save_period: 14400



row_cache_size_in_mb: 0

row_cache_save_period: 0


counter_cache_size_in_mb:

counter_cache_save_period: 7200


saved_caches_directory: /var/lib/cassandra/saved_caches

commitlog_sync: periodic
commitlog_sync_period_in_ms: 10000

commitlog_segment_size_in_mb: 32


seed_provider:
    - class_name: org.apache.cassandra.locator.SimpleSeedProvider
      parameters:
          - seeds: "192.168.30.15, 192.168.40.15"

concurrent_reads: 32
concurrent_writes: 32
concurrent_counter_writes: 32

concurrent_materialized_view_writes: 32







memtable_allocation_type: heap_buffers





index_summary_capacity_in_mb:

index_summary_resize_interval_in_minutes: 60

trickle_fsync: false
trickle_fsync_interval_in_kb: 10240

storage_port: 7000

ssl_storage_port: 7001

listen_address: 192.168.40.15






start_native_transport: true
native_transport_port: 9042



start_rpc: false

rpc_address: 192.168.40.15



rpc_port: 9160


rpc_keepalive: true

rpc_server_type: sync





thrift_framed_transport_size_in_mb: 15

incremental_backups: false

snapshot_before_compaction: false

auto_snapshot: true

column_index_size_in_kb: 64

column_index_cache_size_in_kb: 2


compaction_throughput_mb_per_sec: 16

sstable_preemptive_open_interval_in_mb: 50



read_request_timeout_in_ms: 5000
range_request_timeout_in_ms: 10000
write_request_timeout_in_ms: 2000
counter_write_request_timeout_in_ms: 5000
cas_contention_timeout_in_ms: 1000
truncate_request_timeout_in_ms: 60000
request_timeout_in_ms: 10000

slow_query_log_timeout_in_ms: 500

cross_node_timeout: false



endpoint_snitch: GossipingPropertyFileSnitch

dynamic_snitch_update_interval_in_ms: 100
dynamic_snitch_reset_interval_in_ms: 600000
dynamic_snitch_badness_threshold: 0.1

request_scheduler: org.apache.cassandra.scheduler.NoScheduler



server_encryption_options:
    internode_encryption: none
    keystore: conf/.keystore
    keystore_password: cassandra
    truststore: conf/.truststore
    truststore_password: cassandra

client_encryption_options:
    enabled: false
    optional: false
    keystore: conf/.keystore
    keystore_password: cassandra

internode_compression: dc

inter_dc_tcp_nodelay: false

tracetype_query_ttl: 86400
tracetype_repair_ttl: 604800


enable_user_defined_functions: false

enable_scripted_user_defined_functions: false

enable_materialized_views: true

windows_timer_interval: 1


transparent_data_encryption_options:
    enabled: false
    chunk_length_kb: 64
    cipher: AES/CBC/PKCS5Padding
    key_alias: testing:1
    key_provider:
      - class_name: org.apache.cassandra.security.JKSKeyProvider
        parameters:
          - keystore: conf/.keystore
            keystore_password: cassandra
            store_type: JCEKS
            key_password: cassandra



tombstone_warn_threshold: 1000
tombstone_failure_threshold: 100000

batch_size_warn_threshold_in_kb: 5

batch_size_fail_threshold_in_kb: 50

unlogged_batch_across_partitions_warn_threshold: 10

compaction_large_partition_warning_threshold_mb: 100

gc_warn_threshold_in_ms: 1000


back_pressure_enabled: false
back_pressure_strategy:
    - class_name: org.apache.cassandra.net.RateBasedBackPressure
      parameters:
        - high_ratio: 0.90
          factor: 5
          flow: FAST

cassandra-rackdc.properties

# These properties are used with GossipingPropertyFileSnitch and will
# indicate the rack and dc for this node
dc=DC1
rack=RACK1

# Add a suffix to a datacenter name. Used by the Ec2Snitch and Ec2MultiRegionSnitch
# to append a string to the EC2 region name.
#dc_suffix=

# Uncomment the following line to make this snitch prefer the internal ip when possible, as the Ec2MultiRegionSnitch does.
# prefer_local=true

cassandra-topology.properties

# Cassandra Node IP=Data Center:Rack
192.168.30.15=DC1:RACK1
192.168.40.15=DC1:RACK1

# default for unknown nodes
default=DC1:r1

# Native IPv6 is supported, however you must escape the colon in the IPv6 Address
# Also be sure to comment out JVM_OPTS="$JVM_OPTS -Djava.net.preferIPv4Stack=true"
# in cassandra-env.sh
# fe80\:0\:0\:0\:202\:b3ff\:fe1e\:8329=DC1:RAC3

What could be the origin of this error, and how could it be fixed?

Alex Ott · Accepted Answer

If you cloned the virtual machine with all data, then you have all data of the first node, including the node's ID. To solve this problem, shutdown the 2nd node, delete all data from the data_file_directories and commit logs, leave only the first node as a seed node, and then start the 2nd node, so it will join the cluster as normal, and after this process finished, update the seed list (if you leave the 2nd node in the seed list, it won't join the cluster, but bootstrap a new cluster).

Cassandra assertion error after starting the second node

Answers (1)

Related Questions