gmoore
gmoore

Reputation: 5566

Lucene Java opening too many files. Am I using IndexWriter properly?

My Lucene Java implementation is eating up too many files. I followed the instructions in the Lucene Wiki about too many open files, but that only helped slow the problem. Here is my code to add objects (PTicket) to the index:

//This gets called when the bean is instantiated
public void initializeIndex() {
    analyzer = new WhitespaceAnalyzer(Version.LUCENE_32);
    config = new IndexWriterConfig(Version.LUCENE_32, analyzer);

}


public void addAllToIndex(Collection<PTicket> records) {  
    IndexWriter indexWriter = null;
    config = new IndexWriterConfig(Version.LUCENE_32, analyzer);

    try{
        indexWriter = new IndexWriter(directory, config);
        for(PTicket record : records) {
            Document doc = new Document();
            StringBuffer documentText = new StringBuffer();
            doc.add(new Field("_id", record.getIdAsString(), Field.Store.YES, Field.Index.ANALYZED));
            doc.add(new Field("_type", record.getType(), Field.Store.YES, Field.Index.ANALYZED));

            for(String key : record.getProps().keySet()) {
                List<String> vals = record.getProps().get(key);

                for(String val : vals) {
                    addToDocument(doc, key, val);
                    documentText.append(val).append(" ");
                }
            }
            addToDocument(doc, DOC_TEXT, documentText.toString());        
            indexWriter.addDocument(doc);    
        }

        indexWriter.optimize();
    } catch (Exception e) {
        e.printStackTrace();
    } finally {
        cleanup(indexWriter);
    }
}

private void cleanup(IndexWriter iw) {
    if(iw == null) {
        return;
    }

    try{
        iw.close();
    } catch (IOException ioe) {
        logger.error("Error trying to close index writer");
        logger.error("{}", ioe.getClass().getName());
        logger.error("{}", ioe.getMessage());
    }
}

private void addToDocument(Document doc, String field, String value) {
    doc.add(new Field(field, value, Field.Store.YES, Field.Index.ANALYZED));
}

EDIT TO ADD code for searching

public Set<Object> searchIndex(AthenaSearch search) {  

    try {
        Query q = new QueryParser(Version.LUCENE_32, DOC_TEXT, analyzer).parse(query);

        //search is actually instantiated in initialization.  Lucene recommends this.
        //IndexSearcher searcher = new IndexSearcher(directory, true);
        TopDocs topDocs = searcher.search(q, numResults);
        ScoreDoc[] hits = topDocs.scoreDocs;
        for(int i=start;i<hits.length;++i) {
            int docId = hits[i].doc;
            Document d = searcher.doc(docId);
            ids.add(d.get("_id"));
        }
        return ids;
    } catch (Exception e) {
        e.printStackTrace();
        return null;
    }
}

This code is in a web application.

1) Is this the advised way to use IndexWriter (instantiating a new one on each add to index)?

2) I've read that raising ulimit will help, but that just seems like a band-aid that won't address the actual problem.

3) Could the problem lie with IndexSearcher?

Upvotes: 2

Views: 2645

Answers (4)

Narayan
Narayan

Reputation: 6261

1) Is this the advised way to use IndexWriter (instantiating a new one on each add to index)?

i advise No, there are constructors, which will check if exists or create a new writer, in the directory containing the index. problem 2 would be solved if you reuse the indexwriter.

EDIT:

Ok it seems in Lucene 3.2 the most but one constructors are deprecated,so the resue of Indexwriter can be achieved by using Enum IndexWriterConfig.OpenMode with value CREATE_OR_APPEND.

also, opening new writer and closing on each document add is not efficient,i suggest reuse, if you want to speed up indexing, set the setRamBufferSize default value is 16MB, so do it by trial and error method

from the docs:

Note that you can open an index with create=true even while readers are using the index. The old readers will continue to search the "point in time" snapshot they had opened, and won't see the newly created index until they re-open.

also reuse the IndexSearcher,i cannot see the code for searching, but Indexsearcher is threadsafe and can be used as Readonly as well

also i suggest you to use MergeFactor on writer, this is not necessary but will help on limiting the creation of inverted index files, do it by trial and error method

Upvotes: 3

Shashikant Kore
Shashikant Kore

Reputation: 5052

This question is probably a duplicate of Too many open files Error on Lucene

I am repeating here my answer for that.

Use compound index to reduce file count. When this flag is set, lucene will write a segment as single .cfs file instead of multiple files. This will reduce the number of files significantly.

IndexWriter.setUseCompoundFile(true)

Upvotes: 0

M Platvoet
M Platvoet

Reputation: 1654

The scientific correct answer would be: You can't really tell by this fragment of code.

The more constructive answer would be: You have to make sure that there is only one IndexWriter is writing to the index at any given time and you therefor need some mechanism to make sure of that. So my answer depends of what you want to accomplish:

  • do you want a deeper understanding of Lucene? or..
  • do you just want to build and use an index?

If you answer is the latter, you probably want to look at projects like Solr, which hides all the index reading and writing.

Upvotes: 0

Adrian Conlon
Adrian Conlon

Reputation: 3941

I think we'd need to see your search code to be sure, but I'd suspect that it is a problem with the index searcher. More specifically, make sure that your index reader is being properly closed when you've finished with it.

Good luck,

Upvotes: 1

Related Questions