Reputation: 8722
I dropped a column in Cassandra 1.2 couple days ago by: 1. drop the whole table, 2. recreate the table, without the column, 3. insert insert statement (without the column).
The reason why I did that way is because Cassandra 1.2 doesn't support "drop column" operation.
Today I was notified by Ops team because of the data corruption issue. My questions:
How to fix it?
ERROR [ReadStage:79] 2014-11-04 11:29:55,021 CassandraDaemon.java (line 191) Exception in thread Thread[ReadStage:79,5,main] org.apache.cassandra.io.sstable.CorruptSSTableException: org.apache.cassandra.db.ColumnSerializer$CorruptColumnException: invalid column name length 0 (/data/cassandra/data/xxx/yyy/zzz-Data.db, 1799885 bytes remaining) at org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:110) at org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:40) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:90) at org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:171) at org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:154) at org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext(MergeIterator.java:199) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:160) at org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:136) at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:84) at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:291) at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:65) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1398) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1214) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1130) at org.apache.cassandra.db.Table.getRow(Table.java:344) at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:70) at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:44) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: org.apache.cassandra.db.ColumnSerializer$CorruptColumnException: invalid column name length 0 (/data/cassandra/data/xxx/yyy/zzz-Data.db, 1799885 bytes remaining) at org.apache.cassandra.db.ColumnSerializer$CorruptColumnException.create(ColumnSerializer.java:148) at org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:86) at org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:73) at org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:106) ... 24 more ERROR [ReadStage:89] 2014-11-04 11:29:58,076 CassandraDaemon.java (line 191) Exception in thread Thread[ReadStage:89,5,main] java.lang.OutOfMemoryError: Java heap space at org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:376) at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:392) at org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:355) at org.apache.cassandra.db.ColumnSerializer.deserializeColumnBody(ColumnSerializer.java:108) at org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:92) at org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:73) at org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:106) at org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:40) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:90) at org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:171) at org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:154) at org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext(MergeIterator.java:199)
Upvotes: 1
Views: 2361
Reputation: 7305
C* 1.2 supports column deletions for cql tables - http://www.datastax.com/documentation/cql/3.0/cql/cql_using/use_delete.html
However, I do not see anything wrong from the procedure you described to re-create a new table without your column. Here are some steps to go forward.
The corruption you are seeing is in the new table not the old one (do they have the same name?)
You have a replication factor and number of nodes that are high enough for you to be able to take this node offline
Your client's load balancing policy is set up appropriately so that when the node goes down it will fail over to another node
1) Take your node offline
nodetool drain
This will flush memtables and make your node stop accepting requests.
2) Run nodetool scrub
nodetool scrub [keyspace][table]
If this completes successfully then you are done, bring your node back-up by restarting cassandra and run a nodetool repair keyspace table
3) If scrub errored out (probably with a corruption error), try the sstablescrub utility. ssh into your box and run:
sstablescrub <keyspace> <table>
Note, run this using the same os user you use to start cassandra.
If this completes successfully then you are done, bring your node back-up by restarting cassandra and run a nodetool repair keyspace table
4) If this doesn't work (again errors out with a corruption error) you will have to remove the SStable and rebuild it from your other replicas using repair:
nodetool repair keyspace cf
-- This repair will take time.Upvotes: 5