ramo
ramo

Reputation: 363

Datastax 4.8 Search node - cqlsh connection error

I've recently installed on a three node cluster DSE 4.8.9. The cluster was running fine and was healthy. We've started deleting 4 million records out of 14 millions, so created quite some tombstones. After restarting one of the nodes (10.0.106.7), the node is not exposing port 9042 anymore so i can't connect via cqlsh . Port 7199 is exposed.

The machines setup: 150 GB data drive, 10 GB commitlog on a seperate disc, 32 GB RAM, 8 cores

I'm observing quite a high GC activitiy in the system.log. you can see the system.log on pastebin

this seems to cause that c* is not exposing port 9042 anymore.

I've tried to let the machine run in this state for several hours. The systemlog stays active, but cqlsh is not able to connect. I'm receiving

Connection error: ('Unable to connect to any servers', {'10.0.106.7': error(111, "Tried connecting to [('10.0.106.7', 9042)]. Last error: Connection refused")})

when connecting from the same host to cqlsh.

This leads to the issue, that opscenter can't connect to this instance.

Any suggestions what else I can do to bring cqlsh on this node back?

UPDATE: output of nodetool status:

UN  10.0.106.5  80.46 GB   1       ?       ec3f6f84-41bc-4ae5-85a1-59df023308a7  rack1
UN  10.0.106.6  67.02 GB   1       ?       47388e88-6079-4926-95a6-f4e7627a2037  rack1
UN  10.0.106.7  87.47 GB   1       ?       651c6633-0948-499c-8f7a-98041c87cfb2  rack1

UPDATE2:

netstat -an | grep 9042 returns nothing

excerpt from output.log

INFO  20:37:37,178  Loading settings from file:/etc/dse/cassandra/cassandra.yaml
INFO  20:37:37,287  Node configuration:[authenticator=AllowAllAuthenticator; authorizer=AllowAllAuthorizer; auto_snapshot=true; batch_size_warn_threshold_in_kb=64; batchlog_replay_throttle_in_kb=1024; cas_contention_timeout_in_ms=1000; client_encryption_options=; cluster_name=Gjallarhorn-Public-Cluster; column_index_size_in_kb=64; commit_failure_policy=stop; commitlog_directory=/srv/commitLog; commitlog_segment_size_in_mb=32; commitlog_sync=periodic; commitlog_sync_period_in_ms=10000; commitlog_total_space_in_mb=8192; compaction_large_partition_warning_threshold_mb=100; compaction_throughput_mb_per_sec=16; concurrent_counter_writes=32; concurrent_reads=32; concurrent_writes=32; counter_cache_save_period=7200; counter_cache_size_in_mb=null; counter_write_request_timeout_in_ms=5000; cross_node_timeout=false; data_file_directories=[/srv/cassandra/data]; disk_failure_policy=stop; dynamic_snitch_badness_threshold=0.1; dynamic_snitch_reset_interval_in_ms=600000; dynamic_snitch_update_interval_in_ms=100; endpoint_snitch=com.datastax.bdp.snitch.DseSimpleSnitch; hinted_handoff_enabled=true; hinted_handoff_throttle_in_kb=1024; incremental_backups=false; index_summary_capacity_in_mb=null; index_summary_resize_interval_in_minutes=60; initial_token=3074457345618258602; inter_dc_tcp_nodelay=false; internode_compression=all; key_cache_save_period=14400; key_cache_size_in_mb=null; listen_address=10.0.106.7; max_hint_window_in_ms=10800000; max_hints_delivery_threads=2; memtable_allocation_type=heap_buffers; native_transport_port=9042; num_tokens=1; partitioner=org.apache.cassandra.dht.Murmur3Partitioner; permissions_validity_in_ms=2000; range_request_timeout_in_ms=10000; read_request_timeout_in_ms=5000; request_scheduler=org.apache.cassandra.scheduler.NoScheduler; request_timeout_in_ms=10000; row_cache_save_period=0; row_cache_size_in_mb=0; rpc_address=10.0.106.7; rpc_keepalive=true; rpc_port=9160; rpc_server_type=sync; saved_caches_directory=/srv/cassandra/saved_caches; seed_provider=[{class_name=org.apache.cassandra.locator.SimpleSeedProvider, parameters=[{seeds=10.0.106.5,10.0.106.6,10.0.106.7}]}]; server_encryption_options=; snapshot_before_compaction=false; ssl_storage_port=7001; sstable_preemptive_open_interval_in_mb=50; start_native_transport=true; start_rpc=true; storage_port=7000; thrift_framed_transport_size_in_mb=15; tombstone_failure_threshold=100000; tombstone_warn_threshold=1000; trickle_fsync=false; trickle_fsync_interval_in_kb=10240; truncate_request_timeout_in_ms=60000; unlogged_batch_across_partitions_warn_threshold=10; write_request_timeout_in_ms=2000]
INFO  20:37:37,345  DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap
INFO  20:37:37,402  Global memtable on-heap threshold is enabled at 5632MB
INFO  20:37:37,402  Global memtable off-heap threshold is enabled at 5632MB
INFO  20:37:37,406  Detected search service is enabled, setting my workload to Search
INFO  20:37:37,407  Detected search service is enabled, setting my DC to Solr
INFO  20:37:37,408  Initialized DseDelegateSnitch with workload Search, delegating to com.datastax.bdp.snitch.DseSimpleSnitch

ps -ef|grep cassandra

 
/usr/lib/jvm/java-8-oracle/jre//bin/java -Ddse.system_memory_in_mb=29450 -Dcassandra.config.loader=com.datastax.bdp.config.DseConfigurationLoader -Ddse.system_memory_in_mb=29450 -Dcassandra.config.loader=com.datastax.bdp.config.DseConfigurationLoader -ea -javaagent:/usr/share/dse/cassandra/lib/jamm-0.3.0.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms22G -Xmx22G -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:+AlwaysPreTouch -XX:-UseBiasedLocking -XX:StringTableSize=1000003 -XX:+UseTLAB -XX:+ResizeTLAB -XX:CompileCommandFile=/etc/dse/cassandra/hotspot_compiler -XX:+UseG1GC -XX:G1RSetUpdatingPauseTimePercent=5 -XX:MaxGCPauseMillis=1000 -Djava.net.preferIPv4Stack=true -Dcassandra.jmx.local.port=7199 -XX:+DisableExplicitGC -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=/var/log/cassandra -Dcassandra.storagedir= -Dcassandra-pidfile=/var/run/dse/dse.pid -cp :/usr/share/dse/dse-core-4.8.9.jar:/usr/share/dse/dse-hadoop-4.8.9.jar:/usr/share/dse/dse-hive-4.8.9.jar:/usr/share/dse/dse-search-4.8.9.jar:/usr/share/dse/dse-spark-4.8.9.jar:/usr/share/dse/dse-sqoop-4.8.9.jar:/usr/share/dse/common/HdrHistogram-1.2.1.1.jar:/usr/share/dse/common/antlr-2.7.7.jar:/usr/share/dse/common/antlr-3.2.jar:/usr/share/dse/common/antlr-runtime-3.2.jar:/usr/share/dse/common/aopalliance-1.0.jar:/usr/share/dse/common/api-asn1-api-1.0.0-M33.jar:/usr/share/dse/common/api-asn1-ber-1.0.0-M33.jar:/usr/share/dse/common/api-i18n-1.0.0-M33.jar:/usr/share/dse/common/api-ldap-client-api-1.0.0-M33.jar:/usr/share/dse/common/api-ldap-codec-core-1.0.0-M33.jar:/usr/share/dse/common/api-ldap-codec-standalone-1.0.0-M33.jar:/usr/share/dse/common/api-ldap-extras-aci-1.0.0-M33.jar:/usr/share/dse/common/api-ldap-extras-codec-1.0.0-M33.jar:/usr/share/dse/common/api-ldap-extras-codec-api-1.0.0-M33.jar:/usr/share/dse/common/api-ldap-model-1.0.0-M33.jar:/usr/share/dse/common/api-ldap-net-mina-1.0.0-M33.jar:/usr/share/dse/common/api-util-1.0.0-M33.jar:/usr/share/dse/common/asm-5.0.3.jar:/usr/share/dse/common/commons-beanutils-1.9.2.jar:/usr/share/dse/common/commons-codec-1.9.jar:/usr/share/dse/common/commons-collections-3.2.2.jar:/usr/share/dse/common/commons-compiler-2.6.1.jar:/usr/share/dse/common/commons-configuration-1.6.jar:/usr/share/dse/common/commons-digester-1.8.jar:/usr/share/dse/common/commons-io-2.4.jar:/usr/share/dse/common/commons-lang-2.6.jar:/usr/share/dse/common/commons-logging-1.1.1.jar:/usr/share/dse/common/commons-pool-1.6.jar:/usr/share/dse/common/guava-16.0.1.jar:/usr/share/dse/common/guice-3.0.jar:/usr/share/dse/common/guice-multibindings-3.0.jar:/usr/share/dse/common/jackson-annotations-2.2.2.jar:/usr/share/dse/common/jackson-core-2.2.2.jar:/usr/share/dse/common/jackson-databind-2.2.2.jar:/usr/share/dse/common/janino-2.6.1.jar:/usr/share/dse/common/java-uuid-generator-3.1.3.jar:/usr/share/dse/common/javassist-3.18.2-GA.jar:/usr/share/dse/common/javax.inject-1.jar:/usr/share/dse/common/jbcrypt-0.4d.jar:/usr/share/dse/common/jcl-over-slf4j-1.7.10.jar:/usr/share/dse/common/jline-1.0.jar:/usr/share/dse/common/journalio-1.4.2.jar:/usr/share/dse/common/jsr305-2.0.1.jar:/usr/share/dse/common/kmip-1.7.1e.jar:/usr/share/dse/common/log4j-1.2.13.jar:/usr/share/dse/common/mina-core-2.0.10.jar:/usr/share/dse/common/org.apache.servicemix.bundles.antlr-2.7.7_5.jar:/usr/share/dse/common/reflections-0.9.10.jar:/usr/share/dse/common/slf4j-api-1.7.10.jar:/usr/share/dse/common/stringtemplate-3.2.jar:/usr/share/dse/common/validation-api-1.1.0.Final.jar:/etc/dse:/etc/dse/cassandra:/usr/share/dse/cassandra/tools/lib/stress.jar:/usr/share/dse/cassandra/lib/ST4-4.0.8.jar:/usr/share/dse/cassandra/lib/antlr-3.5.2.jar:/usr/share/dse/cassandra/lib/antlr-runtime-3.5.2.jar:/usr/share/dse/cassandra/lib/cassandra-all-2.1.15.1403.jar:/usr/share/dse/cassandra/lib/cassandra-clientutil-2.1.15.1403.jar:/usr/share/dse/cassandra/lib/cassandra-thrift-2.1.15.1403.jar:/usr/share/dse/cassandra/lib/commons-cli-1.1.jar:/usr/share/dse/cassandra/lib/commons-codec-1.9.jar:/usr/share/dse/cassandra/lib/commons-lang-2.6.jar:/usr/share/dse/cassandra/lib/commons-lang3-3.1.jar:/usr/share

Upvotes: 0

Views: 231

Answers (2)

ramo
ramo

Reputation: 363

After letting the machine run for 1,5 days, the machine seems to be back. I can't find the reason, just the behavirou with extrem GC operations, so could be a gc death spiral

Upvotes: 0

Dip
Dip

Reputation: 344

I had the same problem some times in the past while using and it is fixed now. I would suggest few options here:

  • Check if any of the services is using the port 9042.
  • Stop and restart Java and then restart c*. (This has helped me quite a few times)

Check if you do nodetool status what can you see.

The setting for this port number is present in the file cassandra.yaml in the location /etc/dse/cassandra/ for Ubuntu. There is a line native_transport_port: 9042. Try changing that and restarting c*.

Share your update and we may try helping.

Upvotes: 0

Related Questions