user692704
user692704

Reputation: 554

How to use TermVector Lucene 4.0

In the indexing method I use the following line:

Field contentsField = new Field("contents", new FileReader(f), Field.TermVector.YES);

However, in Lucene 4.0 this constructor is deprecated and new TextField should be used instead of new Field.

But the problem with TextField is that it don't accept TermVector in its constructors.

Is there a way to include the Term Vector in my indexing in Lucene 4.0 with the new constructors?

Thanks

Upvotes: 11

Views: 7202

Answers (3)

RonC
RonC

Reputation: 33897

I was stumped on this for awhile. The other answers here are helpful but even with them, the situation was not obvious to me. So after the light finally went on for me I decided to add this additional answer to make things a bit clearer for the next person.

The reason that the Field signature that supports term vectors is depreciated is because it utilizes the Field.TermVector enum which is depreciated as of Lucene 4.0.

In Lucene 4.0, a new method signature was added to the Field class that supports passing a FieldType instead. The FieldType class is more flexible than the old enum approach and provides the ability to set even more Field options then were previously available.

Here is an example of how to create a Text field, not stored, that supports term vectors by passing a FieldType object when instantiating a Field object.

     FieldType specialTextFieldType = new FieldType(TextField.TYPE_NOT_STORED);
     specialTextFieldType.StoreTermVectors = true;

     Document exampleDoc = new Document();
     exampleDoc.Add(new Field("SomeField", someData, specialTextFieldType ));

Upvotes: 0

amas
amas

Reputation: 604

I had the same problem, so I just simply created my own Field:

public class VecTextField extends Field {

/* Indexed, tokenized, not stored. */
public static final FieldType TYPE_NOT_STORED = new FieldType();

/* Indexed, tokenized, stored. */
public static final FieldType TYPE_STORED = new FieldType();

static {
    TYPE_NOT_STORED.setIndexed(true);
    TYPE_NOT_STORED.setTokenized(true);
    TYPE_NOT_STORED.setStoreTermVectors(true);
    TYPE_NOT_STORED.setStoreTermVectorPositions(true);
    TYPE_NOT_STORED.freeze();

    TYPE_STORED.setIndexed(true);
    TYPE_STORED.setTokenized(true);
    TYPE_STORED.setStored(true);
    TYPE_STORED.setStoreTermVectors(true);
    TYPE_STORED.setStoreTermVectorPositions(true);
    TYPE_STORED.freeze();
}

// TODO: add sugar for term vectors...?

/** Creates a new TextField with Reader value. */
public VecTextField(String name, Reader reader, Store store) {
    super(name, reader, store == Store.YES ? TYPE_STORED : TYPE_NOT_STORED);
}

/** Creates a new TextField with String value. */
public VecTextField(String name, String value, Store store) {
    super(name, value, store == Store.YES ? TYPE_STORED : TYPE_NOT_STORED);
}

/** Creates a new un-stored TextField with TokenStream value. */
public VecTextField(String name, TokenStream stream) {
    super(name, stream, TYPE_NOT_STORED);
}

}

Hope this helps

Upvotes: 14

jpountz
jpountz

Reputation: 9964

TextField is a convenience class for users who need indexed fields without term vectors. If you need terms vectors, just use a Field. It takes a few more lines of code since you need to create an instance of FieldType first, set storeTermVectors and tokenizer to true and then use this FieldType instance in Field constructor.

Upvotes: 13

Related Questions