Reputation: 592
I am using lucene.net. I am writing a code which should index back the same folder after a certain duration. How do I re-index if already contents in that folder were indexed? Say I indexed 4 docs. And after 5 mins still there is no change in any of document then how to manage this scenario? Also I want to know If one of the files was updated recently then how do I only REINDEX back that file by replacing or deleting the older index of same?
Upvotes: 1
Views: 1192
Reputation: 35136
In modern versions of Lucene.Net you can index the key as part of the document, and then prune the existing index based on the key:
var document = new Document()
{
new StringField("DocumentId", source.DocumentId, Field.Store.YES),
new StringField("DocumentPath", source.DocumentPath, Field.Store.YES),
new TextField("Content", source.Text, Field.Store.NO),
new TextField("Tags", source.Tags, Field.Store.YES),
};
...
var writer = GetIndexWriter();
// Delete existing records
var query = new BooleanQuery
{
{new TermQuery(new Term("DocumentId", source.DocumentId)), Occur.MUST},
{new TermQuery(new Term("DocumentPath", source.DocumentPath)), Occur.MUST},
};
writer.DeleteDocuments(query);
// Add new document
writer.AddDocument(document);
You could also, for example, store the last update timestamp as part of the index, and use that to determine when to re-index the file.
There's categorically no need for an external database for this.
Note: You should use StringField
for this, not TextField
, so that you can match complex keys; ie. A TextField
might convert an id like ABC-DEF into the tokens ABC and DEF, and therefore the term search query will fail for the exact match of ABC-DEF
.
Upvotes: 0
Reputation: 5246
Simply store the timestamp of each file, or a CRC somewhere (IE a database).
You then crawl your filesystem and update only files that changed using IndexWriter.UpdateDocument()
,you add new files using IndexWriter.AddDocument()
and delete files that no longer exist using IndexWriter.DeleteDocument()
.
Upvotes: 1