Reputation: 35
I integrate Apache Lucene into Spring Boot application (this is my first experience) and everything good, but I see a bunch of files - indexes: .cfs .si .cfe; How to combine them and is it necessary to do so, if I plan to reach 1 billion files in the index?
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-core</artifactId>
<version>9.8.0</version>
</dependency>
For add new data to index, I wrote the next simple method:
synchronized public void addToIndex(IndexData data) {
Document doc = setDocument(data.id, data.body, data.coutry);
try {
writer.addDocument(doc);
writer.commit();
writer.maybeMerge();
writer.flush();
doc.clear();
} catch (IOException e)
{ e.printStackTrace();}
}
This method located in the class singleton with configuration for IndexWriter: config.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND);
Is it enough to call "maybeMerge()" for Lucene does merge files itself when it is needed?
Upvotes: 1
Views: 141
Reputation: 22032
Bottom line:
If you are not facing a specific problem, then there is probably nothing you need to change, regarding how segment merges are automatically managed by Lucene.
More notes:
Yes, a Lucene index directory will contain "a bunch of files" - see Apache Lucene - Index File Formats for an overview.
Groups of related files form segments, where:
Each segment is a fully independent index, which could be searched separately.
Segments (and their related files) are automatically created and merged by Lucene, as it deems necessary/appropriate, as documents are added to (and removed from) the index. You do not need to take any specific action, unless you are facing a specific situation where a manually triggered merge may be beneficial.
There is a performance cost associated with Lucene needing to search across multiple segments; conversely, there is a performance cost associated with performing a merge. My advice: You should assume Lucene knows best, and leave it to manage its segments itself, unless you are certain you have a good reason to do otherwise.
For example, see the JavaDoc for forceMerge()
, where it states:
This is a horribly costly operation, especially when you pass a small maxNumSegments; usually you should only call this if the index is static (will no longer be changed).
For maybeMerge()
, I'd give the same advice as above: leave it to Lucene, unless you have a very specific reason/problem to intervene. I would absolutely not want to call writer.maybeMerge();
a billion times, on the off-chance that a merge may happen a few of those times.
Upvotes: 2