Adams.H
Adams.H

Reputation: 1653

problems when Improve Lucene Index Performance by reuse Document and Field Instance

There are many to improve Lucene Indexing performance, I have followed many tips from this site ImproveIndexingSpeed Tips , including:

  1. Application of MultiThreads to Indexing by overwriting several method of the IndexWriter's i.e: addDocument updateDocument, this brought me lots of performance improvement(about 7,8 times faster).
  2. Re-use of Document and Field instances. According to the tips, it says : " It's best to create a single Document instance, then add multiple Field instances to it, but hold onto these Field instances and re-use them by changing their values for each added document".

The first tip brings good performance improvement, but the second one does not.

I made Document ,Field static instance so it won't be instantiated every time,(saved creating overhead and resources).

private static Document doc = new Document();
private static Field uinField = new StringField("uin", "", Store.YES);
private static Field nameField = new StringField("name", "", Store.YES);
private static Field urlField = new StringField("url", "", Store.YES);
private static Field servField = new TextField("services", "", Store.YES);

Used Field setValue method to change values in it ,then add them to the doc instance.

uinField.setStringValue(String.valueOf(p.getUin()));
nameField.setStringValue(p.getName());
urlField.setStringValue(p.getUrl());
servField.setStringValue(p.getService());    
doc.add(uinField);
doc.add(nameField);
doc.add(urlField);
doc.add(servField);

After I ran the Indexing, the process stuck in the endless loop. I guess it's because of the MultiThread's side effect, it locked the Document and Field instance and prevent other Threads to addDcoument.


My Question is :

What is wrong about the " Reuse " part ? (I think there must be something wrong with my implementation, because the docs didn't mention that Reuse Document and Field won't compatible with MultiThreads design.

Any suggests about `How to implement Reuse Document and Field ' will be appreciated

Upvotes: 0

Views: 884

Answers (2)

Stone305585
Stone305585

Reputation: 101

You don't need to add the fields to the doc on every iteration. You can just add once out of your loop, and use 'field.setValue writer.addDocument' in your loop. just like this one:

Document doc = new Document();
Field field1 = new TextField("field1", field1Value, Field.Store.YES);
doc.add(field1);
Field field2 = new StringField("field2", field2Value,Field.Store.YES);
doc.add(field2);
while ((line = br.readLine()) != null) {
    field1.setStringValue("field1Value");
    field2.setStringValue("field2Value");

    writer.addDocument(doc);
}

Upvotes: 2

yetuweiba
yetuweiba

Reputation: 241

En,I have seen the ImproveIndexingSpeed Tips , The Tips "Re-use Document and Field instances " has a note:
"Note that you cannot re-use a single Field instance within a Document, and, you should not change a Field's value until the Document containing that Field has been added to the index. See Field for details. "

So, I think you should make sure the Fields had been written in the index. After it , We can re-use the field instance. But, I didn't have a way to know when to make sure the field had been written in the index. If you has the way, Tell me, thank you.

Apologize for my poor English.

Upvotes: 1

Related Questions