Reputation: 25
I'm getting a lot of the following exception while bulk-loading millions of records via sstableloader:
ERROR [Lucene Merge Thread #132642] 2014-07-29 00:35:01,252 CassandraDaemon.java (line 199) Exception in thread Thread[Lucene Merge Thread #132642,6,main]
org.apache.lucene.index.MergePolicy$MergeException: java.lang.IllegalStateException: failed
at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:545)
at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:518)
Caused by: java.lang.IllegalStateException: failed
at org.apache.lucene.util.packed.DirectPackedReader.get(DirectPackedReader.java:93)
at org.apache.lucene.util.packed.BlockPackedReader.get(BlockPackedReader.java:86)
at org.apache.lucene.util.LongValues.get(LongValues.java:35)
at org.apache.lucene.codecs.lucene45.Lucene45DocValuesProducer$5.getOrd(Lucene45DocValuesProducer.java:459)
at org.apache.lucene.codecs.DocValuesConsumer$4$1.setNext(DocValuesConsumer.java:389)
at org.apache.lucene.codecs.DocValuesConsumer$4$1.hasNext(DocValuesConsumer.java:352)
at org.apache.lucene.codecs.lucene45.Lucene45DocValuesConsumer.addNumericField(Lucene45DocValuesConsumer.java:141)
at org.apache.lucene.codecs.lucene45.Lucene45DocValuesConsumer.addSortedField(Lucene45DocValuesConsumer.java:350)
at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.addSortedField(PerFieldDocValuesFormat.java:116)
at org.apache.lucene.codecs.DocValuesConsumer.mergeSortedField(DocValuesConsumer.java:305)
at org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:197)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:116)
at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4058)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3655)
at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
Caused by: java.io.EOFException: Read past EOF (resource: BBIndexInput(name=_13ms5_Lucene45_0.dvd))
at com.datastax.bdp.search.lucene.store.bytebuffer.ByteBufferIndexInput.switchCurrentBuffer(ByteBufferIndexInput.java:188)
at com.datastax.bdp.search.lucene.store.bytebuffer.ByteBufferIndexInput.readByte(ByteBufferIndexInput.java:129)
at org.apache.lucene.store.DataInput.readShort(DataInput.java:77)
at com.datastax.bdp.search.lucene.store.bytebuffer.ByteBufferIndexInput.readShort(ByteBufferIndexInput.java:89)
at org.apache.lucene.util.packed.DirectPackedReader.get(DirectPackedReader.java:64)
... 15 more
I see from the exception trace that it has something to do with Long values and EOF. However, I have no idea what is triggering the error. The SSTable files that I'm trying to import were generated by a Java program (written by me) that uses org.apache.cassandra.io.sstable.CQLSSTableWriter.
The CF schema, Solr schema, and SSTable generator code can be found here: https://www.dropbox.com/sh/1rpo3ixmz1bg9y2/AAA3aqlfzWEsNIwy79G9dASba
PS:
I encountered the error initially in 3 out of 6 nodes. I restarted all of them and I was able to import 150+ million records without an error. But when I left the imports unattended while I sleep, that's when the error resurfaced in 1 out of 6 nodes.
I'm getting quite alarmed now because the number of indexed records in each node (according to Solr admin UI) is smaller by approximately 60,000 records compared to the number of Cassandra rows (according to nodetool cfstats)
UPDATE:
Still continuing to experience this. The discrepancy between the number of indexed documents (Solr) and stored documents (Cassandra cfstats) is getting bigger day by day
UPDATE (2014-08-13):
Changed the directory factory as suggested by Rock Brain; but the error re-occurred within a few hours of continuous import via sstableloader
UPDATE (2014-08-14):
Interestingly, I noticed that I'm actually getting two similar exceptions (with the difference being only the stack trace of the last "caused by":
Exception 1:
ERROR [Lucene Merge Thread #24937] 2014-08-14 06:20:32,270 CassandraDaemon.java (line 199) Exception in thread Thread[Lucene Merge Thread #24937,6,main]
org.apache.lucene.index.MergePolicy$MergeException: java.lang.IllegalStateException: failed
at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:545)
at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:518)
Caused by: java.lang.IllegalStateException: failed
at org.apache.lucene.util.packed.DirectPackedReader.get(DirectPackedReader.java:93)
at org.apache.lucene.util.packed.BlockPackedReader.get(BlockPackedReader.java:86)
at org.apache.lucene.util.LongValues.get(LongValues.java:35)
at org.apache.lucene.codecs.lucene45.Lucene45DocValuesProducer$5.getOrd(Lucene45DocValuesProducer.java:459)
at org.apache.lucene.codecs.DocValuesConsumer$4$1.setNext(DocValuesConsumer.java:389)
at org.apache.lucene.codecs.DocValuesConsumer$4$1.hasNext(DocValuesConsumer.java:352)
at org.apache.lucene.codecs.lucene45.Lucene45DocValuesConsumer.addNumericField(Lucene45DocValuesConsumer.java:141)
at org.apache.lucene.codecs.lucene45.Lucene45DocValuesConsumer.addSortedField(Lucene45DocValuesConsumer.java:350)
at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.addSortedField(PerFieldDocValuesFormat.java:116)
at org.apache.lucene.codecs.DocValuesConsumer.mergeSortedField(DocValuesConsumer.java:305)
at org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:197)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:116)
at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4058)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3655)
at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
Caused by: java.io.EOFException: Read past EOF (resource: BBIndexInput(name=_67nex_Lucene45_0.dvd))
at com.datastax.bdp.search.lucene.store.bytebuffer.ByteBufferIndexInput.switchCurrentBuffer(ByteBufferIndexInput.java:188)
at com.datastax.bdp.search.lucene.store.bytebuffer.ByteBufferIndexInput.readByte(ByteBufferIndexInput.java:129)
at org.apache.lucene.util.packed.DirectPackedReader.get(DirectPackedReader.java:64)
... 15 more
Exception 2 (exactly the same as the original exception on top of this post):
ERROR [Lucene Merge Thread #24936] 2014-08-14 06:20:34,694 CassandraDaemon.java (line 199) Exception in thread Thread[Lucene Merge Thread #24936,6,main]
org.apache.lucene.index.MergePolicy$MergeException: java.lang.IllegalStateException: failed
at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:545)
at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:518)
Caused by: java.lang.IllegalStateException: failed
at org.apache.lucene.util.packed.DirectPackedReader.get(DirectPackedReader.java:93)
at org.apache.lucene.util.packed.BlockPackedReader.get(BlockPackedReader.java:86)
at org.apache.lucene.util.LongValues.get(LongValues.java:35)
at org.apache.lucene.codecs.lucene45.Lucene45DocValuesProducer$5.getOrd(Lucene45DocValuesProducer.java:459)
at org.apache.lucene.codecs.DocValuesConsumer$4$1.setNext(DocValuesConsumer.java:389)
at org.apache.lucene.codecs.DocValuesConsumer$4$1.hasNext(DocValuesConsumer.java:352)
at org.apache.lucene.codecs.lucene45.Lucene45DocValuesConsumer.addNumericField(Lucene45DocValuesConsumer.java:141)
at org.apache.lucene.codecs.lucene45.Lucene45DocValuesConsumer.addSortedField(Lucene45DocValuesConsumer.java:350)
at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.addSortedField(PerFieldDocValuesFormat.java:116)
at org.apache.lucene.codecs.DocValuesConsumer.mergeSortedField(DocValuesConsumer.java:305)
at org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:197)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:116)
at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4058)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3655)
at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
Caused by: java.io.EOFException: Read past EOF (resource: BBIndexInput(name=_67fvk_Lucene45_0.dvd))
at com.datastax.bdp.search.lucene.store.bytebuffer.ByteBufferIndexInput.switchCurrentBuffer(ByteBufferIndexInput.java:188)
at com.datastax.bdp.search.lucene.store.bytebuffer.ByteBufferIndexInput.readByte(ByteBufferIndexInput.java:129)
at org.apache.lucene.store.DataInput.readShort(DataInput.java:77)
at com.datastax.bdp.search.lucene.store.bytebuffer.ByteBufferIndexInput.readShort(ByteBufferIndexInput.java:89)
at org.apache.lucene.util.packed.DirectPackedReader.get(DirectPackedReader.java:64)
... 15 more
UPDATE part 2 (2014-08-14):
example RELOAD warning:
WARN [http-8983-2] 2014-08-14 08:31:28,828 CassandraCoreContainer.java (line 739) Too much waiting for new searcher...
WARN [http-8983-2] 2014-08-14 08:31:28,831 SolrCores.java (line 375) Tried to remove core myks.mycf from pendingCoreOps and it wasn't there.
INFO [http-8983-2] 2014-08-14 08:31:28,832 StorageService.java (line 2644) Starting repair command #3, repairing 0 ranges for keyspace solr_admin
INFO [http-8983-2] 2014-08-14 08:31:28,835 SolrDispatchFilter.java (line 672) [admin] webapp=null path=/admin/cores params={slave=true&deleteAll=false&name=myks.mycf&distributed=false&action=RELOAD&reindex=false&core=myks.mycf&wt=javabin&version=2} status=0 QTime=61640
UPDATE (2014-08-23):
I wasn't able to reproduce the exception anymore after re-doing the suggested workaround
Upvotes: 2
Views: 1122
Reputation: 261
Update your solrconfig.xml for all your cores: swap the directoryFactory
from com.datastax.bdp.cassandra.index.solr.DSENRTCachingDirectoryFactory
to solr.MMapDirectoryFactory
.
Also, what OS, JVM version is being used, how many CPUs, heap size, total available memory. How many minutes/hours into the loading does the error occur.
Upvotes: 2