Mandy
Mandy

Reputation: 592

reindexing in lucene.net

I am using lucene.net. I am writing a code which should index back the same folder after a certain duration. How do I re-index if already contents in that folder were indexed? Say I indexed 4 docs. And after 5 mins still there is no change in any of document then how to manage this scenario? Also I want to know If one of the files was updated recently then how do I only REINDEX back that file by replacing or deleting the older index of same?

Upvotes: 1

Views: 1192

Answers (2)

Doug
Doug

Reputation: 35136

In modern versions of Lucene.Net you can index the key as part of the document, and then prune the existing index based on the key:

var document = new Document()
{
  new StringField("DocumentId", source.DocumentId, Field.Store.YES),
  new StringField("DocumentPath", source.DocumentPath, Field.Store.YES),
  new TextField("Content", source.Text, Field.Store.NO),
  new TextField("Tags", source.Tags, Field.Store.YES),
};

...

var writer = GetIndexWriter();

// Delete existing records
var query = new BooleanQuery
{
  {new TermQuery(new Term("DocumentId", source.DocumentId)), Occur.MUST},
  {new TermQuery(new Term("DocumentPath", source.DocumentPath)), Occur.MUST},
};
writer.DeleteDocuments(query);

// Add new document
writer.AddDocument(document);

You could also, for example, store the last update timestamp as part of the index, and use that to determine when to re-index the file.

There's categorically no need for an external database for this.

Note: You should use StringField for this, not TextField, so that you can match complex keys; ie. A TextField might convert an id like ABC-DEF into the tokens ABC and DEF, and therefore the term search query will fail for the exact match of ABC-DEF.

Upvotes: 0

Jf Beaulac
Jf Beaulac

Reputation: 5246

Simply store the timestamp of each file, or a CRC somewhere (IE a database).

You then crawl your filesystem and update only files that changed using IndexWriter.UpdateDocument() ,you add new files using IndexWriter.AddDocument() and delete files that no longer exist using IndexWriter.DeleteDocument().

Upvotes: 1

Related Questions