Reputation: 363
I've recently installed on a three node cluster DSE 4.8.9. The cluster was running fine and was healthy. We've started deleting 4 million records out of 14 millions, so created quite some tombstones. After restarting one of the nodes (10.0.106.7), the node is not exposing port 9042 anymore so i can't connect via cqlsh . Port 7199 is exposed.
The machines setup: 150 GB data drive, 10 GB commitlog on a seperate disc, 32 GB RAM, 8 cores
I'm observing quite a high GC activitiy in the system.log. you can see the system.log on pastebin
this seems to cause that c* is not exposing port 9042 anymore.
I've tried to let the machine run in this state for several hours. The systemlog stays active, but cqlsh is not able to connect. I'm receiving
Connection error: ('Unable to connect to any servers', {'10.0.106.7': error(111, "Tried connecting to [('10.0.106.7', 9042)]. Last error: Connection refused")})
when connecting from the same host to cqlsh.
This leads to the issue, that opscenter can't connect to this instance.
Any suggestions what else I can do to bring cqlsh on this node back?
UPDATE: output of nodetool status:
UN 10.0.106.5 80.46 GB 1 ? ec3f6f84-41bc-4ae5-85a1-59df023308a7 rack1 UN 10.0.106.6 67.02 GB 1 ? 47388e88-6079-4926-95a6-f4e7627a2037 rack1 UN 10.0.106.7 87.47 GB 1 ? 651c6633-0948-499c-8f7a-98041c87cfb2 rack1
UPDATE2:
netstat -an | grep 9042 returns nothing
excerpt from output.log
INFO 20:37:37,178 Loading settings from file:/etc/dse/cassandra/cassandra.yaml INFO 20:37:37,287 Node configuration:[authenticator=AllowAllAuthenticator; authorizer=AllowAllAuthorizer; auto_snapshot=true; batch_size_warn_threshold_in_kb=64; batchlog_replay_throttle_in_kb=1024; cas_contention_timeout_in_ms=1000; client_encryption_options=; cluster_name=Gjallarhorn-Public-Cluster; column_index_size_in_kb=64; commit_failure_policy=stop; commitlog_directory=/srv/commitLog; commitlog_segment_size_in_mb=32; commitlog_sync=periodic; commitlog_sync_period_in_ms=10000; commitlog_total_space_in_mb=8192; compaction_large_partition_warning_threshold_mb=100; compaction_throughput_mb_per_sec=16; concurrent_counter_writes=32; concurrent_reads=32; concurrent_writes=32; counter_cache_save_period=7200; counter_cache_size_in_mb=null; counter_write_request_timeout_in_ms=5000; cross_node_timeout=false; data_file_directories=[/srv/cassandra/data]; disk_failure_policy=stop; dynamic_snitch_badness_threshold=0.1; dynamic_snitch_reset_interval_in_ms=600000; dynamic_snitch_update_interval_in_ms=100; endpoint_snitch=com.datastax.bdp.snitch.DseSimpleSnitch; hinted_handoff_enabled=true; hinted_handoff_throttle_in_kb=1024; incremental_backups=false; index_summary_capacity_in_mb=null; index_summary_resize_interval_in_minutes=60; initial_token=3074457345618258602; inter_dc_tcp_nodelay=false; internode_compression=all; key_cache_save_period=14400; key_cache_size_in_mb=null; listen_address=10.0.106.7; max_hint_window_in_ms=10800000; max_hints_delivery_threads=2; memtable_allocation_type=heap_buffers; native_transport_port=9042; num_tokens=1; partitioner=org.apache.cassandra.dht.Murmur3Partitioner; permissions_validity_in_ms=2000; range_request_timeout_in_ms=10000; read_request_timeout_in_ms=5000; request_scheduler=org.apache.cassandra.scheduler.NoScheduler; request_timeout_in_ms=10000; row_cache_save_period=0; row_cache_size_in_mb=0; rpc_address=10.0.106.7; rpc_keepalive=true; rpc_port=9160; rpc_server_type=sync; saved_caches_directory=/srv/cassandra/saved_caches; seed_provider=[{class_name=org.apache.cassandra.locator.SimpleSeedProvider, parameters=[{seeds=10.0.106.5,10.0.106.6,10.0.106.7}]}]; server_encryption_options=; snapshot_before_compaction=false; ssl_storage_port=7001; sstable_preemptive_open_interval_in_mb=50; start_native_transport=true; start_rpc=true; storage_port=7000; thrift_framed_transport_size_in_mb=15; tombstone_failure_threshold=100000; tombstone_warn_threshold=1000; trickle_fsync=false; trickle_fsync_interval_in_kb=10240; truncate_request_timeout_in_ms=60000; unlogged_batch_across_partitions_warn_threshold=10; write_request_timeout_in_ms=2000] INFO 20:37:37,345 DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap INFO 20:37:37,402 Global memtable on-heap threshold is enabled at 5632MB INFO 20:37:37,402 Global memtable off-heap threshold is enabled at 5632MB INFO 20:37:37,406 Detected search service is enabled, setting my workload to Search INFO 20:37:37,407 Detected search service is enabled, setting my DC to Solr INFO 20:37:37,408 Initialized DseDelegateSnitch with workload Search, delegating to com.datastax.bdp.snitch.DseSimpleSnitch
ps -ef|grep cassandra
/usr/lib/jvm/java-8-oracle/jre//bin/java -Ddse.system_memory_in_mb=29450 -Dcassandra.config.loader=com.datastax.bdp.config.DseConfigurationLoader -Ddse.system_memory_in_mb=29450 -Dcassandra.config.loader=com.datastax.bdp.config.DseConfigurationLoader -ea -javaagent:/usr/share/dse/cassandra/lib/jamm-0.3.0.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms22G -Xmx22G -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:+AlwaysPreTouch -XX:-UseBiasedLocking -XX:StringTableSize=1000003 -XX:+UseTLAB -XX:+ResizeTLAB -XX:CompileCommandFile=/etc/dse/cassandra/hotspot_compiler -XX:+UseG1GC -XX:G1RSetUpdatingPauseTimePercent=5 -XX:MaxGCPauseMillis=1000 -Djava.net.preferIPv4Stack=true -Dcassandra.jmx.local.port=7199 -XX:+DisableExplicitGC -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=/var/log/cassandra -Dcassandra.storagedir= -Dcassandra-pidfile=/var/run/dse/dse.pid -cp :/usr/share/dse/dse-core-4.8.9.jar:/usr/share/dse/dse-hadoop-4.8.9.jar:/usr/share/dse/dse-hive-4.8.9.jar:/usr/share/dse/dse-search-4.8.9.jar:/usr/share/dse/dse-spark-4.8.9.jar:/usr/share/dse/dse-sqoop-4.8.9.jar:/usr/share/dse/common/HdrHistogram-1.2.1.1.jar:/usr/share/dse/common/antlr-2.7.7.jar:/usr/share/dse/common/antlr-3.2.jar:/usr/share/dse/common/antlr-runtime-3.2.jar:/usr/share/dse/common/aopalliance-1.0.jar:/usr/share/dse/common/api-asn1-api-1.0.0-M33.jar:/usr/share/dse/common/api-asn1-ber-1.0.0-M33.jar:/usr/share/dse/common/api-i18n-1.0.0-M33.jar:/usr/share/dse/common/api-ldap-client-api-1.0.0-M33.jar:/usr/share/dse/common/api-ldap-codec-core-1.0.0-M33.jar:/usr/share/dse/common/api-ldap-codec-standalone-1.0.0-M33.jar:/usr/share/dse/common/api-ldap-extras-aci-1.0.0-M33.jar:/usr/share/dse/common/api-ldap-extras-codec-1.0.0-M33.jar:/usr/share/dse/common/api-ldap-extras-codec-api-1.0.0-M33.jar:/usr/share/dse/common/api-ldap-model-1.0.0-M33.jar:/usr/share/dse/common/api-ldap-net-mina-1.0.0-M33.jar:/usr/share/dse/common/api-util-1.0.0-M33.jar:/usr/share/dse/common/asm-5.0.3.jar:/usr/share/dse/common/commons-beanutils-1.9.2.jar:/usr/share/dse/common/commons-codec-1.9.jar:/usr/share/dse/common/commons-collections-3.2.2.jar:/usr/share/dse/common/commons-compiler-2.6.1.jar:/usr/share/dse/common/commons-configuration-1.6.jar:/usr/share/dse/common/commons-digester-1.8.jar:/usr/share/dse/common/commons-io-2.4.jar:/usr/share/dse/common/commons-lang-2.6.jar:/usr/share/dse/common/commons-logging-1.1.1.jar:/usr/share/dse/common/commons-pool-1.6.jar:/usr/share/dse/common/guava-16.0.1.jar:/usr/share/dse/common/guice-3.0.jar:/usr/share/dse/common/guice-multibindings-3.0.jar:/usr/share/dse/common/jackson-annotations-2.2.2.jar:/usr/share/dse/common/jackson-core-2.2.2.jar:/usr/share/dse/common/jackson-databind-2.2.2.jar:/usr/share/dse/common/janino-2.6.1.jar:/usr/share/dse/common/java-uuid-generator-3.1.3.jar:/usr/share/dse/common/javassist-3.18.2-GA.jar:/usr/share/dse/common/javax.inject-1.jar:/usr/share/dse/common/jbcrypt-0.4d.jar:/usr/share/dse/common/jcl-over-slf4j-1.7.10.jar:/usr/share/dse/common/jline-1.0.jar:/usr/share/dse/common/journalio-1.4.2.jar:/usr/share/dse/common/jsr305-2.0.1.jar:/usr/share/dse/common/kmip-1.7.1e.jar:/usr/share/dse/common/log4j-1.2.13.jar:/usr/share/dse/common/mina-core-2.0.10.jar:/usr/share/dse/common/org.apache.servicemix.bundles.antlr-2.7.7_5.jar:/usr/share/dse/common/reflections-0.9.10.jar:/usr/share/dse/common/slf4j-api-1.7.10.jar:/usr/share/dse/common/stringtemplate-3.2.jar:/usr/share/dse/common/validation-api-1.1.0.Final.jar:/etc/dse:/etc/dse/cassandra:/usr/share/dse/cassandra/tools/lib/stress.jar:/usr/share/dse/cassandra/lib/ST4-4.0.8.jar:/usr/share/dse/cassandra/lib/antlr-3.5.2.jar:/usr/share/dse/cassandra/lib/antlr-runtime-3.5.2.jar:/usr/share/dse/cassandra/lib/cassandra-all-2.1.15.1403.jar:/usr/share/dse/cassandra/lib/cassandra-clientutil-2.1.15.1403.jar:/usr/share/dse/cassandra/lib/cassandra-thrift-2.1.15.1403.jar:/usr/share/dse/cassandra/lib/commons-cli-1.1.jar:/usr/share/dse/cassandra/lib/commons-codec-1.9.jar:/usr/share/dse/cassandra/lib/commons-lang-2.6.jar:/usr/share/dse/cassandra/lib/commons-lang3-3.1.jar:/usr/share
Upvotes: 0
Views: 231
Reputation: 363
After letting the machine run for 1,5 days, the machine seems to be back. I can't find the reason, just the behavirou with extrem GC operations, so could be a gc death spiral
Upvotes: 0
Reputation: 344
I had the same problem some times in the past while using and it is fixed now. I would suggest few options here:
Check if you do nodetool status
what can you see.
The setting for this port number is present in the file cassandra.yaml
in the location /etc/dse/cassandra/
for Ubuntu. There is a line native_transport_port: 9042
. Try changing that and restarting c*.
Share your update and we may try helping.
Upvotes: 0