Single-node container goes unresponsive after 24 hours of continuous inserts

We are running Cassandra as SingleNode container along with other microservices. Data will be streaming into one of the microservice which will be writing data continuously to cassandra using gocql driver.

docker-compose file of cassandra

 ia_cassandradb:
    image: cassandra:4.1.6
    container_name: ia_cassandradb
    hostname: ia_cassandradb
    restart: unless-stopped
    ipc: "none"
    read_only: true
    security_opt:
    - no-new-privileges
    command: |
      /bin/bash -c "
      cp -r /etc/cassandra/* /etc/cassandra_tmp/.
      cp /etc/cassandra/cassandra_bk.yaml /etc/cassandra_tmp/cassandra.yaml
      if [[ $DEV_MODE == true ]]
      then
        sed -i '/client_encryption_options:/{n;s/.*/  enabled: false/}' /etc/cassandra_tmp/cassandra.yaml
      else
        cd /run/secrets/Cassandra
        rm -rf /etc/cassandra_tmp/dse-truststore.jks /etc/cassandra_tmp/secrets.p12 /etc/cassandra_tmp/certkey-keystore.jks
        keytool -keystore /etc/cassandra_tmp/dse-truststore.jks -storetype PKCS12 -importcert -file 'cacert.pem' -alias cacert -storepass cassandra -noprompt
        openssl pkcs12 -export -in DataStore_Server_server_certificate.pem -inkey DataStore_Server_server_key.pem -name localhost -out /etc/cassandra_tmp/secrets.p12 -password pass:cassandra
        keytool -importkeystore -deststorepass cassandra -destkeystore /etc/cassandra_tmp/certkey-keystore.jks -srckeystore /etc/cassandra_tmp/secrets.p12 -srcstoretype PKCS12 -srcstorepass cassandra
      fi
      while true; do nodetool clearsnapshot --all; nodetool compact; nodetool cleanup; sleep 600; done &
      /usr/local/bin/docker-entrypoint.sh
      "
    networks:
    - net_iso
    environment:
      AppName: "DataStore"
      CASSANDRA_CONF: /etc/cassandra_tmp
      DEV_MODE: ${DEV_MODE}
    volumes:
      # Mount for influxdb data directory and configuration
    - ./Certificates/Server_Certs/:/run/secrets/Cassandra:rw
    - vol_cassandra_tmp:/tmp
    - vol_cassandra_log:/opt/cassandra/logs
    - vol_cassandra_etc:/etc/cassandra_tmp
    - ${EII_INSTALL_PATH}/data/cassandra:/var/lib/cassandra/data
    - ../config_files/cassandra/cassandra.yaml:/etc/cassandra/cassandra_bk.yaml:ro

System configuration

13th Gen Intel(R) Core(TM) i9-13900K
RAM 32GB
Storage: 1TB SSD NVMe

Its the same issue with other system's also. Tried with 12th gen 64gb ram

Data will be streamed continously to datastore microservice which writes to cassandra container. We have 120~150writes/sec which consist of both json and blob data comes nearby 1mb per write. Also there would subsequent reads happening.

Expected behavior: We should have all writes successful all the time.

Actually behavior: Initially writes are very fast. After few hours of execution writes starts to slow down. Post that. Writes starts to fail and cassandra goes unresponsive (not able to connect using cqlsh also). (this happens in 24hrs)

Explicit restart of container will restore the cassandra.

Upvotes: 2

Views: 52

Answers (1)

Erick Ramirez
Erick Ramirez

Reputation: 16293

There's really not enough detail in your post to help you but my best guess is that the node gets overloaded and eventually stops responding.

It doesn't look like you've configured the heap for Cassandra so it will get capped at 8GB maximum. That's too small a heap for G1 GC which is the default GC.

G1 GC performs best with large heaps so consider bumping MAX_HEAP_SIZE to 16-24GB.

Furthermore, don't expect much from a single-node cluster. If the node is in fact getting overloaded (you will need to do your own analysis since you haven't provided relevant details), the minimum recommended is 3 nodes. Cheers!

Upvotes: 1

Related Questions