Reputation: 554
In the indexing method I use the following line:
Field contentsField = new Field("contents", new FileReader(f), Field.TermVector.YES);
However, in Lucene 4.0 this constructor is deprecated and new TextField
should be used instead of new Field
.
But the problem with TextField
is that it don't accept TermVector
in its constructors.
Is there a way to include the Term Vector in my indexing in Lucene 4.0 with the new constructors?
Thanks
Upvotes: 11
Views: 7202
Reputation: 33897
I was stumped on this for awhile. The other answers here are helpful but even with them, the situation was not obvious to me. So after the light finally went on for me I decided to add this additional answer to make things a bit clearer for the next person.
The reason that the Field
signature that supports term vectors is depreciated is because it utilizes the Field.TermVector enum
which is depreciated as of Lucene 4.0.
In Lucene 4.0, a new method signature was added to the Field
class that supports passing a FieldType
instead. The FieldType
class is more flexible than the old enum
approach and provides the ability to set even more Field options then were previously available.
Here is an example of how to create a Text field, not stored, that supports term vectors by passing a FieldType
object when instantiating a Field
object.
FieldType specialTextFieldType = new FieldType(TextField.TYPE_NOT_STORED);
specialTextFieldType.StoreTermVectors = true;
Document exampleDoc = new Document();
exampleDoc.Add(new Field("SomeField", someData, specialTextFieldType ));
Upvotes: 0
Reputation: 604
I had the same problem, so I just simply created my own Field:
public class VecTextField extends Field {
/* Indexed, tokenized, not stored. */
public static final FieldType TYPE_NOT_STORED = new FieldType();
/* Indexed, tokenized, stored. */
public static final FieldType TYPE_STORED = new FieldType();
static {
TYPE_NOT_STORED.setIndexed(true);
TYPE_NOT_STORED.setTokenized(true);
TYPE_NOT_STORED.setStoreTermVectors(true);
TYPE_NOT_STORED.setStoreTermVectorPositions(true);
TYPE_NOT_STORED.freeze();
TYPE_STORED.setIndexed(true);
TYPE_STORED.setTokenized(true);
TYPE_STORED.setStored(true);
TYPE_STORED.setStoreTermVectors(true);
TYPE_STORED.setStoreTermVectorPositions(true);
TYPE_STORED.freeze();
}
// TODO: add sugar for term vectors...?
/** Creates a new TextField with Reader value. */
public VecTextField(String name, Reader reader, Store store) {
super(name, reader, store == Store.YES ? TYPE_STORED : TYPE_NOT_STORED);
}
/** Creates a new TextField with String value. */
public VecTextField(String name, String value, Store store) {
super(name, value, store == Store.YES ? TYPE_STORED : TYPE_NOT_STORED);
}
/** Creates a new un-stored TextField with TokenStream value. */
public VecTextField(String name, TokenStream stream) {
super(name, stream, TYPE_NOT_STORED);
}
}
Hope this helps
Upvotes: 14
Reputation: 9964
TextField is a convenience class for users who need indexed fields without term vectors. If you need terms vectors, just use a Field. It takes a few more lines of code since you need to create an instance of FieldType first, set storeTermVectors
and tokenizer
to true and then use this FieldType
instance in Field
constructor.
Upvotes: 13