Ruby9191
Ruby9191

Reputation: 113

Cassandra: Is there a relation between TombstoneOverwhelmingException by READSTAGE and a broken pipe exception?

Cassandra System Log:

ERROR [ReadStage:8468] 2016-05-09 08:58:28,029 SliceQueryFilter.java (line 206) Scanned over 100000 tombstones in AAAAA.EVENT_QUEUE_DATA; query aborted (see tombstone_failure_threshold)
ERROR [ReadStage:8468] 2016-05-09 08:58:28,029 CassandraDaemon.java (line 258) Exception in thread Thread[ReadStage:8468,5,main]
java.lang.RuntimeException: org.apache.cassandra.db.filter.TombstoneOverwhelmingException
at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2008)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Application Log:

! java.net.SocketException: Broken pipe
! at java.net.SocketOutputStream.socketWrite0(Native Method) ~[na:1.8.0_45]
! at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109) ~[na:1.8.0_45]
! at java.net.SocketOutputStream.write(SocketOutputStream.java:153) ~[na:1.8.0_45]
! at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122) ~[na:1.8.0_45]

I don't know the exact cause for this yet. My guess is a trigger of many delete calls to Cassandra might have caused this situation. Any advise would be very helpful to me at this moment. Thanks a lot.

Upvotes: 0

Views: 157

Answers (2)

Chris Lohfink
Chris Lohfink

Reputation: 16410

A temp workaround, you can increase tombstone_failure_threshold in cassandra.yaml.

I guess from the AAAAA.EVENT_QUEUE_DATA name is that you've implemented a queue. This is an anti-pattern which also would cause exactly what your explaining. This will continue to get bad and cause a lot of GC style issues and performance problems down the road.

Knowing that doesn't really help you today though. I would suggest you increase your failure threshold (above) and update your compaction strategy to help in future. Heres an idea:

ALTER TABLE footable WITH
  compaction = {'class': 'LeveledCompactionStrategy', 
    'sstable_size_in_mb': '256mb',
    'tombstone_compaction_interval': '14400',
    'unchecked_tombstone_compaction': 'true',
    'tombstone_threshold': '0.05'} AND 
  gc_grace_seconds = 14400  # assuming you will use everything in queue within this window of seconds

But you will want to make changes in your application. Keep in mind that more aggressive tombstone removal creates a possibility for a delete to be "lost" but its not very likely and is better than being down.

Upvotes: 1

Matija Gobec
Matija Gobec

Reputation: 880

Tombstones are generated when you "delete" your data and they represent logical markers for delete functionality. This is a part of a mechanism which helps you fight ghost columns. If you deleted a lot of data you can easily hit tombstone warning and even error (like in your case). There is a gc_grace period setting on your table which defines retention time for tombstones. Also, try to avoid selecting everything (make select statements target actual data instead of range queries).

Upvotes: 0

Related Questions