Lionel Raeymaekers
Lionel Raeymaekers

Reputation: 11

Lucene document update with Field.NoStore

I'm using Lucene.Net 3.0.3.0.

I have an index which contains documents with several fields and some that are not stored. When there is an update on the client related to this document, I want to update the ClientName field of the corresponding documents but the problem I have is that when I try to get all the documents for this client, update the field and put the document back in the index the fieds that are not stored are lost.

So a document like this :

DocId : 10
ClientId : 125
ClientName (Not stored) : Google
DocTitle (Not stored): Some title
DocContent (Not stored): content of my document

When an update occur on the client name, I have to update the field clientName, so I make a search on the index to find all the docs with the clientId 125, then update the clientname field and delete/insert the doc back in the index. But the fields DocTitle and DocContent are lost in the process.

How can I update the ClientName field without loosing other non stored fields ?

Edit: As suggested, here is a piece code that iterates over the results of the search to update a field on the documents

for (int i = 0; i < collector.Docs.Count; i++)
{
    // retrieve the current document from the search result
    var oldDocument = searcher.Doc(collector.Docs[i]);
    var reviewId = oldDocument.Get(FIELD_ID);
    var updateTerm = new Term(FIELD_ID, reviewId);

    oldDocument.RemoveFields(FIELD_CLIENT_NAME);
    oldDocument.Add(new Field(FIELD_CLIENT_NAME, newClientName, Field.Store.NO, Field.Index.NOT_ANALYZED));

    //writer is an instance of Lucene.Net.Index.IndexWriter
    writer.UpdateDocument(updateTerm, oldDocument);
}

From there, the document is saved correctly in the index but fields that are not stored (Field.Store.NO) are lost except FIELD_CLIENT_NAME that I just updated.

Upvotes: 1

Views: 540

Answers (2)

AaronLS
AaronLS

Reputation: 38394

I've confirmed this behavior as well. Any indexed fields which are not marked as Stored will be lost from the index.

My work around has been to identify how to retrieve the non-stored fields on demand to facility adding them back into the document during an update operation, or if the field is very small I just go ahead and index it initially as Stored so that it doesn't need to be re-added during document updates.

This is a real shame as I'd happily accept some index fragmentation to not have to store the data. I save alot of space being able to embed indexes without storing all of the fields and enabling embedded client side searches.

Upvotes: 0

Allen Chou
Allen Chou

Reputation: 1237

Not sure what's gonna on under the hood, but one thing for sure is that what you are trying is not the way to update document in Lucene.

Check the definition of removeField method in org.apache.lucene.document.Document.

Note that the removeField(s) methods like the add method only make sense prior to adding a document to an index. These methods cannot be used to change the content of an existing index! In order to achieve this, a document has to be deleted from an index and a new changed version of that document has to be added.

As you know, the update method will first delete the document(s) matching the provided term and then add the new one.

one possible way I can think of based on the practice in our project is as follows.

Assuming all of documents are indexed from the database. So once the database record gets updated, you have to update the corresponding Lucene document as well.

The pseudo code may like this

  indexWriter.updateDocument(
    //databaseRecordId is normally the primary key of updated record
    new Term("docId", databaseRecordPId) 
    indexedDocument // find the updated database record and index it
  )

If your Lucene documents are not "coming from" database records, just forget what I say.

Anyway, hope it helps.

Upvotes: 0

Related Questions