Mikhail Karpych
Mikhail Karpych

Reputation: 23

DSE 4.6 to 4.7: 1 MUTATION messages dropped in last 5000ms

after upgrade our cluster(4DC, ubuntu 14.04 x64, cpp-driver 2.0.1 as client in our app) from 4.6 to 4.7, got message in logs on few nodes with small load "MessagingService.java:888 - 1 MUTATION messages dropped in last 5000ms" with 1 Pending HintedHandoff notice in thread pool dump

what i try:
run "nodetool truncatehints" on each running node in cluster
changing openjdk to oracle jdk(1.7.0_76-b13)
decommission node and rejoin it

how to find this mutation/hint and drop it?

side note:
we do not increase load ( version 4.6 work ok with this load)
we do not decrease node count
we have ssd backed storage

fixed in https://issues.apache.org/jira/browse/CASSANDRA-9129

Upvotes: 1

Views: 4610

Answers (1)

phact
phact

Reputation: 7305

Dropped mutations usually mean that your disk is not able to keep up with your ingest. You may be interested, at this point, to find out if there are any threadpools backing up (usually flushwriters if this is an IO issue). This is why cassandra will log the treadpool status at that moment.

Cassandra is built on a SEDA architecture with multiple thread pools that can handle up to a certain number of parallel tasks. Pending threadpool tasks pile up when there are more active tasks than the pool can concurrently handle. They will eventually get processed once the system has resources to do so, or dropped under extreme circumstances.

To see the current status of your thread pools use nodetool tpstats. Most likely your hints task has already been processed.

The fact that you were accumulating hints implies that some of your nodes were down and hints are being replayed for consistency now that the node has come back up.

Your core issue is the dropped mutations. Consider one of the following actions if you continue to see this:

  • Add nodes
  • Get better storage (don't use shared storage-- I.E. amazon EBS, ssd's are faster than spinning disks)
  • Decrease your workload
  • Make sure you are loading with best practices (good data model that spreads out the load, a datastax driver that has loadbalancing etc. )

Upvotes: 3

Related Questions