Andrei Ivanov
Andrei Ivanov

Reputation: 659

TombstoneOverwhelmingException in Cassandra

So I'm getting this exception when querying the data from a table. I read a lot online and from what i understand, this happens because i have a lot of null rows. But what's a way to solve this? Can I just easily get rid of all these nulls?

UPDATE: I ran nodetool compact and also tried scrubing. In both cases I get this.

Exception in thread "main" java.lang.AssertionError: [SSTableReader(path='/var/lib/cassandra/data/bitcoin/okcoin_order_book_btc_usd/bitcoin-okcoin_order_book_btc_usd-jb-538-Data.db'), SSTableReader(path='/var/lib/cassandra/data/bitcoin/okcoin_order_book_btc_usd/bitcoin-okcoin_order_book_btc_usd-jb-710-Data.db'), SSTableReader(path='/var/lib/cassandra/data/bitcoin/okcoin_order_book_btc_usd/bitcoin-okcoin_order_book_btc_usd-jb-627-Data.db'), SSTableReader(path='/var/lib/cassandra/data/bitcoin/okcoin_order_book_btc_usd/bitcoin-okcoin_order_book_btc_usd-jb-437-Data.db')]
at org.apache.cassandra.db.ColumnFamilyStore$13.call(ColumnFamilyStore.java:2132)
at org.apache.cassandra.db.ColumnFamilyStore$13.call(ColumnFamilyStore.java:2129)
at org.apache.cassandra.db.ColumnFamilyStore.runWithCompactionsDisabled(ColumnFamilyStore.java:2111)
at org.apache.cassandra.db.ColumnFamilyStore.markAllCompacting(ColumnFamilyStore.java:2142)
at org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy.getMaximalTask(SizeTieredCompactionStrategy.java:254)
at org.apache.cassandra.db.compaction.CompactionManager.submitMaximal(CompactionManager.java:290)
at org.apache.cassandra.db.compaction.CompactionManager.performMaximal(CompactionManager.java:282)
at org.apache.cassandra.db.ColumnFamilyStore.forceMajorCompaction(ColumnFamilyStore.java:1941)
at org.apache.cassandra.service.StorageService.forceKeyspaceCompaction(StorageService.java:2182)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75)
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279)
at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487)
at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97)
at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328)
at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420)
at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322)
at sun.rmi.transport.Transport$1.run(Transport.java:177)
at sun.rmi.transport.Transport$1.run(Transport.java:174)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:173)
at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

And these are the last lines from system.log

INFO [CompactionExecutor:1888] 2015-01-03 07:22:54,272 CompactionController.java (line 192) Compacting large row bitcoin/okcoin_trade_btc_cny:1972-05 (225021398 bytes) incrementally
INFO [CompactionExecutor:1888] 2015-01-03 07:23:07,528 CompactionController.java (line 192) Compacting large row bitcoin/okcoin_trade_btc_cny:1972-06 (217772702 bytes) incrementally
INFO [CompactionExecutor:1888] 2015-01-03 07:23:20,508 CompactionController.java (line 192) Compacting large row bitcoin/okcoin_trade_btc_cny:2014-05 (121911398 bytes) incrementally
INFO [ScheduledTasks:1] 2015-01-03 07:23:30,941 GCInspector.java (line 116) GC for ParNew: 223 ms for 1 collections, 5642103584 used; max is 8375238656
INFO [CompactionExecutor:1888] 2015-01-03 07:23:33,436 CompactionController.java (line 192) Compacting large row bitcoin/okcoin_trade_btc_cny:2014-07 (106408526 bytes) incrementally
INFO [CompactionExecutor:1888] 2015-01-03 07:23:38,787 CompactionController.java (line 192) Compacting large row bitcoin/okcoin_trade_btc_cny:2014-02 (112031822 bytes) incrementally
INFO [CompactionExecutor:1888] 2015-01-03 07:23:46,055 ColumnFamilyStore.java (line 794) Enqueuing flush of Memtable-compactions_in_progress@582986122(0/0 serialized/live bytes, 1 ops)
INFO [FlushWriter:62] 2015-01-03 07:23:46,055 Memtable.java (line 355) Writing Memtable-compactions_in_progress@582986122(0/0 serialized/live bytes, 1 ops)
INFO [FlushWriter:62] 2015-01-03 07:23:46,268 Memtable.java (line 395) Completed flushing /var/lib/cassandra/data/system/compactions_in_progress/system-compactions_in_progress-jb-22-Data.db (42 bytes) for commitlog position ReplayPosition(segmentId=1420135510457, position=14938165)
INFO [CompactionExecutor:1888] 2015-01-03 07:23:46,354 CompactionTask.java (line 287) Compacted 2 sstables to [/var/lib/cassandra/data/bitcoin/okcoin_trade_btc_cny/bitcoin-okcoin_trade_btc_cny-jb-554,].  881,267,752 bytes to 881,266,793 (~99% of original) in 162,878ms = 5.159945MB/s.  24 total partitions merged to 23.  Partition merge counts were {1:22, 2:1, }
WARN [RMI TCP Connection(39)-128.31.5.27] 2015-01-03 07:24:46,452 ColumnFamilyStore.java (line 2103) Unable to cancel in-progress compactions for okcoin_order_book_btc_usd.  Probably there is an unusually large row in progress somewhere.  It is also possible that buggy code left some sstables compacting after it was done with them

I'm not sure what the last line means. There seem to not be extremely large rows (i don't know how to find if there are any). As a note there is still that compaction stuck at 60.33% and it's stuck at okcoin_order_book_btc_usd. I'm running Cassandra 2.0.11

Upvotes: 2

Views: 2273

Answers (1)

Andy Tolbert
Andy Tolbert

Reputation: 11638

Tombstones are created when you delete rows or when they are expired from Cassandra. They will be removed on compaction of SSTables after gc_grace_seconds elapses for that row.

There are a few things I can think of to help alleviate the number of tombstones:

  1. Set a lower gc_grace_seconds for the table that has a lot of tombstones - gc_grace_seconds should typically be 1 day greater than how often you are doing repairs. If you are doing repairs more often than this, you could consider lowering gc_grace_seconds.
  2. Take a look at how your compactions are going. Do you have a lot of pending compactions? (nodetool -h localhost compactionstats on each node will show this). It's possible you are getting behind on compactions and data isn't getting cleaned up as soon as it should be. It may also be worth considering changing your Compaction Strategy if appropriate. For example, if you are using SizeTieredCompactionStrategy, it may be worth looking into LeveledCompactionStrategy, this strategy typically causes more compaction activity (so make sure you have SSDs), which could get your tombstones cleaned up faster.
  3. Take a look at your data model and the queries you are making. Are you frequently deleting or have expired data in a partition that you are reading from frequently? Consider changing your partitioning (primary key) strategy so it is less likely for deleted or expired rows to be in your 'live' data. A good example of this add time/date to your primary key.
  4. Tweak tombstone_failure_threshold in cassandra.yaml - Probably would not consider doing this as this is a good indication you need to look at your data.

Upvotes: 5

Related Questions