Reputation: 5684
I was loading a large CSV into Cassandra using cassandra-loader.
The VM ran out of disk space during this process and crashed. I allocated more disk space to the VM and tried starting cassandra but it refused to start due to problems with SSTables and commit log.
I could not run nodetool repair
as it is only works when the node is online.
I ran sstablescrub
which took about 1 hour to finish. So I thought it might have fixed it.
But I still get this error in system.log
ERROR [SSTableBatchOpen:4] 2015-10-23 18:57:45,035 SSTableReader.java:506 - Corrupt sstable /var/lib/cassandra/data/keyspace1/location-777a33d0772911e597a98b820c5778a4/la-1709-big=[TOC.txt, CompressionInfo.db, Statistics.db, Digest.adler32, Data.db, Index.db, Filter.db]; skipping table
org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.EOFException
at org.apache.cassandra.io.compress.CompressionMetadata.<init>(CompressionMetadata.java:125) ~[apache-cassandra-2.2.3.jar:2.2.3]
at org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:86) ~[apache-cassandra-2.2.3.jar:2.2.3]
at org.apache.cassandra.io.util.CompressedSegmentedFile$Builder.metadata(CompressedSegmentedFile.java:142) ~[apache-cassandra-2.2.3.jar:2.2.3]
at org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:101) ~[apache-cassandra-2.2.3.jar:2.2.3]
at org.apache.cassandra.io.util.SegmentedFile$Builder.complete(SegmentedFile.java:187) ~[apache-cassandra-2.2.3.jar:2.2.3]
at org.apache.cassandra.io.util.SegmentedFile$Builder.complete(SegmentedFile.java:179) ~[apache-cassandra-2.2.3.jar:2.2.3]
at org.apache.cassandra.io.sstable.format.SSTableReader.load(SSTableReader.java:703) ~[apache-cassandra-2.2.3.jar:2.2.3]
at org.apache.cassandra.io.sstable.format.SSTableReader.load(SSTableReader.java:664) ~[apache-cassandra-2.2.3.jar:2.2.3]
at org.apache.cassandra.io.sstable.format.SSTableReader.open(SSTableReader.java:458) ~[apache-cassandra-2.2.3.jar:2.2.3]
at org.apache.cassandra.io.sstable.format.SSTableReader.open(SSTableReader.java:363) ~[apache-cassandra-2.2.3.jar:2.2.3]
at org.apache.cassandra.io.sstable.format.SSTableReader$4.run(SSTableReader.java:501) ~[apache-cassandra-2.2.3.jar:2.2.3]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_60]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_60]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_60]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_60]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
Caused by: java.io.EOFException: null
at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340) ~[na:1.8.0_60]
at java.io.DataInputStream.readUTF(DataInputStream.java:589) ~[na:1.8.0_60]
at java.io.DataInputStream.readUTF(DataInputStream.java:564) ~[na:1.8.0_60]
at org.apache.cassandra.io.compress.CompressionMetadata.<init>(CompressionMetadata.java:96) ~[apache-cassandra-2.2.3.jar:2.2.3]
... 15 common frames omitted
INFO [main] 2015-10-23 18:57:45,193 ColumnFamilyStore.java:382 - Initializing system_auth.role_permissions
INFO [main] 2015-10-23 18:57:45,201 ColumnFamilyStore.java:382 - Initializing system_auth.resource_role_permissons_index
INFO [main] 2015-10-23 18:57:45,213 ColumnFamilyStore.java:382 - Initializing system_auth.roles
INFO [main] 2015-10-23 18:57:45,233 ColumnFamilyStore.java:382 - Initializing system_auth.role_members
INFO [main] 2015-10-23 18:57:45,240 ColumnFamilyStore.java:382 - Initializing system_traces.sessions
INFO [main] 2015-10-23 18:57:45,252 ColumnFamilyStore.java:382 - Initializing system_traces.events
INFO [main] 2015-10-23 18:57:45,265 ColumnFamilyStore.java:382 - Initializing simplex.songs
INFO [main] 2015-10-23 18:57:45,276 ColumnFamilyStore.java:382 - Initializing simplex.playlists
INFO [pool-2-thread-1] 2015-10-23 18:57:45,289 AutoSavingCache.java:187 - reading saved cache /var/lib/cassandra/saved_caches/KeyCache-ca.db
INFO [pool-2-thread-1] 2015-10-23 18:57:45,313 AutoSavingCache.java:163 - Completed loading (25 ms; 36 keys) KeyCache cache
INFO [main] 2015-10-23 18:57:45,351 CommitLog.java:168 - Replaying /var/lib/cassandra/commitlog/CommitLog-5-1445578022702.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022703.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022704.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022705.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022706.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022707.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022708.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022709.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022710.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022712.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022713.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022714.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022715.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022716.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022717.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022719.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022720.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022721.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022722.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022723.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022724.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022725.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022727.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022728.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022730.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022731.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022732.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022733.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022734.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022736.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022738.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022740.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022741.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022743.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022744.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022745.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022746.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022748.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022749.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022750.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022751.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022752.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022753.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022755.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022756.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022758.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022759.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022760.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022761.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022763.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022764.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022765.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022766.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022767.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022768.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022769.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022770.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022771.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022772.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022773.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022774.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022775.log, /var/lib/cassandra/commitlog/CommitLog-5-1445578022776.log, /var/lib/cassandra/commitlog/CommitLog-5-1445588991268.log, /var/lib/cassandra/commitlog/CommitLog-5-1445589094722.log, /var/lib/cassandra/commitlog/CommitLog-5-1445589149527.log, /var/lib/cassandra/commitlog/CommitLog-5-1445595828633.log, /var/lib/cassandra/commitlog/CommitLog-5-1445595898055.log, /var/lib/cassandra/commitlog/CommitLog-5-1445596033717.log, /var/lib/cassandra/commitlog/CommitLog-5-1445596400441.log, /var/lib/cassandra/commitlog/CommitLog-5-1445596601854.log, /var/lib/cassandra/commitlog/CommitLog-5-1445598032544.log, /var/lib/cassandra/commitlog/CommitLog-5-1445598758663.log, /var/lib/cassandra/commitlog/CommitLog-5-1445601112953.log, /var/lib/cassandra/commitlog/CommitLog-5-1445601937334.log, /var/lib/cassandra/commitlog/CommitLog-5-1445601985416.log, /var/lib/cassandra/commitlog/CommitLog-5-1445604504389.log, /var/lib/cassandra/commitlog/CommitLog-5-1445606516196.log
ERROR [main] 2015-10-23 18:59:05,091 JVMStabilityInspector.java:78 - Exiting due to error while processing commit log during initialization.
org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: Mutation checksum failure at 4110758 in CommitLog-5-1445578022776.log
at org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:622) [apache-cassandra-2.2.3.jar:2.2.3]
at org.apache.cassandra.db.commitlog.CommitLogReplayer.replaySyncSection(CommitLogReplayer.java:492) [apache-cassandra-2.2.3.jar:2.2.3]
at org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:388) [apache-cassandra-2.2.3.jar:2.2.3]
at org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:147) [apache-cassandra-2.2.3.jar:2.2.3]
at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:189) [apache-cassandra-2.2.3.jar:2.2.3]
at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:169) [apache-cassandra-2.2.3.jar:2.2.3]
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:273) [apache-cassandra-2.2.3.jar:2.2.3]
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:513) [apache-cassandra-2.2.3.jar:2.2.3]
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:622) [apache-cassandra-2.2.3.jar:2.2.3]
Exiting due to error while processing commit log during initialization.
How do I fix this? This is test data, I am okay with losing it. How would one handle this in production to avoid data loss?
Tried setting disk_failure_policy: ignore
so that I could run nodetool repair
once the server is up. But the server does not start even with this setting.
I am operating a single node and replication factor is 1. Would having more nodes and a >1 replication factor enabled me to fix an issue like this without data loss?
I am using Cassandra 2.2.3
Upvotes: 37
Views: 33996
Reputation: 2025
This is how I fixed the problem with commit logs. You should only do this if you don't care about preserving the state of your commit logs.
Try to restart cassandra using
sudo systemctl restart cassandra
Then I check
systemctl status cassandra
and see that the status is 'exited' so there is a problem. Check the logs for cassandra using
sudo less /var/log/cassandra/system.log
and see org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: Could not read commit log descriptor in file /var/lib/cassandra/commitlog/CommitLog-6-1498210233635.log
Because I don't care about preserving the state of Cassandra I delete all of the commit logs and it now boots up fine
sudo rm /var/lib/cassandra/commitlog/CommitLog*
sudo systemctl restart cassandra
systemctl status cassandra
(should confirm that it it now running)
Upvotes: 21
Reputation: 697
Simply goes in log directory in cassandra and delete the log files. It work fine....
Upvotes: 8
Reputation: 1313
Since you don't care about the data, removing files from \data\commitlogs should be easiest solution.
Upvotes: 43
Reputation: 1044
Having some replication would surely help you to fix this without data loss but it would come with a price.
Despite all your effort you cannot manage to recover your corrupted sstable. So you decide to remove it from your file system to start Cassandra again. If you do not have replication your data is lost. But if you have replication on the cluster, you can possibly fetch the data from other nodes. That is what nodetool repair
do !
So nodetool repair
does not repair corrupted sstable. Basicallynodetool repair
compare tables from node to node to find missing or inconsistent data and then repair it. You can find more information on how it works here.
However nodetool repair
is very expensive, it is long and uses a lot of cpu, disk and network. There is this good post about repair benefits and drawbacks.
Upvotes: 18