Reputation: 1
I've been having problems with Apache Cassandra database access via spring-data-cassandra:
The application is a small Spring Boot (1.4.0) server application using Spring Data Cassandra (tried 1.4.2 and 1.4.4). The application collects data from remote clients and implements some administrative GUI based on a REST interface on the server side, including a dashboard prepared every 10 seconds by using Spring @Scheduled tasks and delivering data to clients (browsers) via websocket protocol. Traffic is secured by using HTTPS and bilateral authentication (server + client certificates).
The current state of application is being tested in a setup with a database (2.2.8), running on the same cloud server (connecting via loopback 127.0.0.1 address) having Ubuntu 14.04 OS. A couple of test clients create load resulting in around 300k database records per hour (50k master and 5x50k detail records) being inserted, uploading data every 5 seconds or so. The dashboard is trawling through the last hour of traffic and creating statistics. Average CPU use from the sar utility is around 10%. Current database size is around 25GB.
Data inserts are made in small batches - I've tried also individual writes but the problem hasn't disappeared, just the CPU usage got increased for around 50% while testing with single writes.
I've done a lot of Google "research" about the topic and found nothing specific, but tried quite a few of advices as e.g. putting schema name in all queries and a couple of configuration options - with apparently no effect to the final outcome (blocked server needing restart). Server has run for up to 30 hours or so, but sometimes gets blocked within 1-2 hours, usually running 7-10 hours before the driver getting stuck, with no obvious pattern in the running period.
I've been monitoring the heap - nothing particular to see, no data structures piling up with time. Server is run with -Xms2g -Xmx3g -XX:+PrintGCDetails
The error eventually appearing is:
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: inpresec-cassandra/127.0.1.1:9042 (com.datastax.driver.core.OperationTimedOutException: [inpresec-cassandra/127.0.1.1:9042] Operation timed out))
at com.datastax.driver.core.RequestHandler.reportNoMoreHosts(RequestHandler.java:217) ~[cassandra-driver-core-2.1.9.jar!/:na]
at com.datastax.driver.core.RequestHandler.access$1000(RequestHandler.java:44) ~[cassandra-driver-core-2.1.9.jar!/:na]
at com.datastax.driver.core.RequestHandler$SpeculativeExecution.sendRequest(RequestHandler.java:276) ~[cassandra-driver-core-2.1.9.jar!/:na]
at com.datastax.driver.core.RequestHandler$SpeculativeExecution$1.run(RequestHandler.java:374) ~[cassandra-driver-core-2.1.9.jar!/:na]
... 3 common frames omitted
What I have also noticed is that the cassandra process reports the virtual memory size matching approximately the size of the database - i noticed it when the database was around 12GB and it has been following the database size faithfully - not sure if this has anything to do with the server problem. The resident part of the database is between 2 and 3GB. The resident part of the server is typically 1.5-2.5GB. Total memory of the cloud server is 8GB.
Before running Cassandra directly in the host VM OS, I was running it in Docker and had the same problem - moving out of Docker was done to exclude Docker from the "list of suspects".
If anybody had anything similar I'd appreciate information or advice.
Upvotes: 0
Views: 879
Reputation: 1
The problem has apparently been solved by upgrading Netty and providing support for epoll protocol to be used instead of the default fallback to NIO. Originally in pom.xml there was:
<dependency>
<groupId>io.netty</groupId>
<artifactId>netty-all</artifactId>
<version>4.0.9.Final</version>
</dependency>
Now this has been changed to:
<dependency>
<groupId>io.netty</groupId>
<artifactId>netty-all</artifactId>
<version>4.0.29.Final</version>
</dependency>
<dependency>
<groupId>io.netty</groupId>
<artifactId>netty-transport-native-epoll</artifactId>
<version>4.0.29.Final</version>
<!-- Explicitly bring in the linux classifier, test may fail on 32-bit linux -->
<classifier>linux-x86_64</classifier>
<scope>test</scope>
</dependency>
adding the second specification for explicit inclusion of the epoll support, as sugested here.
After this change, the original message appearing in the log file:
com.datastax.driver.core.NettyUtil : Did not find Netty's native epoll transport in the classpath, defaulting to NIO.
has changed into:
com.datastax.driver.core.NettyUtil : Found Netty's native epoll transport in the classpath, using it
Since then there have been no random failures - tried "killing" the DB connection by creating extra large queries several time - it dutifully reported memory error - and then recovered.
Upvotes: 0