Spring Data Cassandra driver gets stuck after few hours, with single-node database on the same node

Question

I've been having problems with Apache Cassandra database access via spring-data-cassandra:

Sometimes the server cannot connect to the database at the start - typically it works in the 2nd attempt
Once started, a couple of times an hour, in random moments, few requests fail with timeout and then it continues working fine
Finally, after few hours, the driver starts consistently refusing requests, reporting timeouts - and the server needs to be restarted

The application is a small Spring Boot (1.4.0) server application using Spring Data Cassandra (tried 1.4.2 and 1.4.4). The application collects data from remote clients and implements some administrative GUI based on a REST interface on the server side, including a dashboard prepared every 10 seconds by using Spring @Scheduled tasks and delivering data to clients (browsers) via websocket protocol. Traffic is secured by using HTTPS and bilateral authentication (server + client certificates).

The current state of application is being tested in a setup with a database (2.2.8), running on the same cloud server (connecting via loopback 127.0.0.1 address) having Ubuntu 14.04 OS. A couple of test clients create load resulting in around 300k database records per hour (50k master and 5x50k detail records) being inserted, uploading data every 5 seconds or so. The dashboard is trawling through the last hour of traffic and creating statistics. Average CPU use from the sar utility is around 10%. Current database size is around 25GB.

Data inserts are made in small batches - I've tried also individual writes but the problem hasn't disappeared, just the CPU usage got increased for around 50% while testing with single writes.

I've done a lot of Google "research" about the topic and found nothing specific, but tried quite a few of advices as e.g. putting schema name in all queries and a couple of configuration options - with apparently no effect to the final outcome (blocked server needing restart). Server has run for up to 30 hours or so, but sometimes gets blocked within 1-2 hours, usually running 7-10 hours before the driver getting stuck, with no obvious pattern in the running period.

I've been monitoring the heap - nothing particular to see, no data structures piling up with time. Server is run with -Xms2g -Xmx3g -XX:+PrintGCDetails

The error eventually appearing is:

Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: inpresec-cassandra/127.0.1.1:9042 (com.datastax.driver.core.OperationTimedOutException: [inpresec-cassandra/127.0.1.1:9042] Operation timed out))
        at com.datastax.driver.core.RequestHandler.reportNoMoreHosts(RequestHandler.java:217) ~[cassandra-driver-core-2.1.9.jar!/:na]
        at com.datastax.driver.core.RequestHandler.access$1000(RequestHandler.java:44) ~[cassandra-driver-core-2.1.9.jar!/:na]
        at com.datastax.driver.core.RequestHandler$SpeculativeExecution.sendRequest(RequestHandler.java:276) ~[cassandra-driver-core-2.1.9.jar!/:na]
        at com.datastax.driver.core.RequestHandler$SpeculativeExecution$1.run(RequestHandler.java:374) ~[cassandra-driver-core-2.1.9.jar!/:na]
        ... 3 common frames omitted

What I have also noticed is that the cassandra process reports the virtual memory size matching approximately the size of the database - i noticed it when the database was around 12GB and it has been following the database size faithfully - not sure if this has anything to do with the server problem. The resident part of the database is between 2 and 3GB. The resident part of the server is typically 1.5-2.5GB. Total memory of the cloud server is 8GB.

Before running Cassandra directly in the host VM OS, I was running it in Docker and had the same problem - moving out of Docker was done to exclude Docker from the "list of suspects".

If anybody had anything similar I'd appreciate information or advice.

Spring Data Cassandra driver gets stuck after few hours, with single-node database on the same node

Answers (1)

Related Questions