Reputation: 113
Cassandra System Log:
ERROR [ReadStage:8468] 2016-05-09 08:58:28,029 SliceQueryFilter.java (line 206) Scanned over 100000 tombstones in AAAAA.EVENT_QUEUE_DATA; query aborted (see tombstone_failure_threshold)
ERROR [ReadStage:8468] 2016-05-09 08:58:28,029 CassandraDaemon.java (line 258) Exception in thread Thread[ReadStage:8468,5,main]
java.lang.RuntimeException: org.apache.cassandra.db.filter.TombstoneOverwhelmingException
at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2008)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Application Log:
! java.net.SocketException: Broken pipe
! at java.net.SocketOutputStream.socketWrite0(Native Method) ~[na:1.8.0_45]
! at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109) ~[na:1.8.0_45]
! at java.net.SocketOutputStream.write(SocketOutputStream.java:153) ~[na:1.8.0_45]
! at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122) ~[na:1.8.0_45]
I don't know the exact cause for this yet. My guess is a trigger of many delete calls to Cassandra might have caused this situation. Any advise would be very helpful to me at this moment. Thanks a lot.
Upvotes: 0
Views: 157
Reputation: 16410
A temp workaround, you can increase tombstone_failure_threshold
in cassandra.yaml
.
I guess from the AAAAA.EVENT_QUEUE_DATA
name is that you've implemented a queue. This is an anti-pattern which also would cause exactly what your explaining. This will continue to get bad and cause a lot of GC style issues and performance problems down the road.
Knowing that doesn't really help you today though. I would suggest you increase your failure threshold (above) and update your compaction strategy to help in future. Heres an idea:
ALTER TABLE footable WITH
compaction = {'class': 'LeveledCompactionStrategy',
'sstable_size_in_mb': '256mb',
'tombstone_compaction_interval': '14400',
'unchecked_tombstone_compaction': 'true',
'tombstone_threshold': '0.05'} AND
gc_grace_seconds = 14400 # assuming you will use everything in queue within this window of seconds
But you will want to make changes in your application. Keep in mind that more aggressive tombstone removal creates a possibility for a delete to be "lost" but its not very likely and is better than being down.
Upvotes: 1
Reputation: 880
Tombstones are generated when you "delete" your data and they represent logical markers for delete functionality. This is a part of a mechanism which helps you fight ghost columns. If you deleted a lot of data you can easily hit tombstone warning and even error (like in your case). There is a gc_grace period setting on your table which defines retention time for tombstones. Also, try to avoid selecting everything (make select statements target actual data instead of range queries).
Upvotes: 0