Reputation: 18831
I'm really puzzled why it keeps dying with java.lang.OutOfMemoryError during indexing even though it has a few GBs of memory.
Is there a fundamental reason why it needs manual tweaking of config files / jvm parameters instead of it just figuring out how much memory is available and limiting itself to that? No other programs except Solr ever have this kind of problem.
Yes, I can keep tweaking JVM heap size every time such crashes happen, but this is all so backwards.
Here's stack trace of the latest such crash in case it is relevant:
SEVERE: java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:3209)
at java.lang.String.<init>(String.java:216)
at org.apache.lucene.index.TermBuffer.toTerm(TermBuffer.java:122)
at org.apache.lucene.index.SegmentTermEnum.term(SegmentTermEnum.java:169)
at org.apache.lucene.search.FieldCacheImpl$StringIndexCache.createValue(FieldCacheImpl.java:701)
at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:208)
at org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:676)
at org.apache.lucene.search.FieldComparator$StringOrdValComparator.setNextReader(FieldComparator.java:667)
at org.apache.lucene.search.TopFieldCollector$OneComparatorNonScoringCollector.setNextReader(TopFieldCollector.java:94)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:245)
at org.apache.lucene.search.Searcher.search(Searcher.java:171)
at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:988)
at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:884)
at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:341)
at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:182)
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
at java.lang.Thread.run(Thread.java:619)
Upvotes: 15
Views: 15523
Reputation: 314
An old question but since I stumbled upon it:
Upvotes: 0
Reputation: 160
Upvotes: 0
Reputation: 25825
I was using this Java:
$ java -version
java version "1.6.0"
OpenJDK Runtime Environment (build 1.6.0-b09)
OpenJDK 64-Bit Server VM (build 1.6.0-b09, mixed mode)
Which was running out of heap space, but then I upgraded to this Java:
$ java -version
java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
And now it works fine, on a huge dataset, with lots of term facets.
Upvotes: 0
Reputation: 17375
You are using the post.jar to index data? This jar has a bug in solr1.2/1.3 I think (but I don't know the details). Our company has fixed this internally and it should be also fixed in the latest trunk solr1.4/1.5.
Upvotes: 0
Reputation: 4168
I'm not certain there is a steadfast way to ensure you won't run into OutOfMemoryExceptions with Lucene. The problem you are facing is problem related to the use of FieldCache. From the Lucene API "Maintains caches of term values.". If your terms exceed the amount of memory allocated to the JVM you'll get the exception.
The documents are being sorted "at org.apache.lucene.search.FieldComparator$StringOrdValComparator.setNextReader(FieldComparator.java:667)", which will take up as much memory as is needed to store the terms being sorted for the index.
You'll need to review projected size of the fields that are sortable and adjust the JVM settings accordingly.
Upvotes: 2
Reputation: 6241
a wild guess, the documents you are indexing are very large
Lucene by default only indexes the first 10,000 terms of a document to avoid OutOfMemory errors, you can overcome this limit see setMaxFieldLength
Also, you could call optimize() and close as soon as you are done with processing with Indexwriter()
a definite way is to profile and find the bottleneck =]
Upvotes: 0
Reputation: 12853
Looking at the stack trace, it looks like you are performing a search, and sorting by a field. If you need to sort by a field, internally Lucene needs to load up all the values of all the terms in the field into memory. If the field contains a lot of data, then it is very possible that you may run out of memory.
Upvotes: 3