Reputation: 7388
I am working on building a search engine for building domain specific data using Lucene. Lucene is clearly powerful and customizable. Originally I created my own field types and was using those but then I was getting 0 hits so I read this and found that I should use text fields. One of my fields is a date and another is a low cardinality category. I looked through the setters for Field
and couldn't figure out what StringField
and TextField
implied and how I should look at them. Should I use a custom field type for not strictly textual fields?
Upvotes: 0
Views: 1037
Reputation: 4190
The difference between TextField
and StringField
are hidden in the class FieldType
. FieldType
allows you to define fields with custom properties like this:
FieldType type = new FieldType();
type.setTokenized(true);
type.setStoreTermVectors(true);
...
document.add(new Field("fieldName", someString, type));
So, both classes extend from Field
and set a different field type. This gets more confusing, since the field type differs depending on if the field is stored or not. (Source: Lucene 6.5 source code)
To make it short:
StringField
for information like IDs, URLs, etc. which won't require any tokening, stemming or other processing by some Analyzer.TextField
for information which requires this processing, such as the title or the content of a document.StoredField
.Looking at the documentation, we can see that besides the types Field
, StringField
and TextField
lucene offers mostly numeric "Points". Points work like fields in the meaning that they are indexed, but not stored (see StoredField
above for that).
For your date, I would recommend using a LongPoint
to store a timestamp, e.g.:
document.add(new LongPoint("date", someCalendar.getTimeInMillis() / 1000));
Using a point will later allow you to perform range queries using LongPoint.newRangeQuery
, which can be used to retrieve the documents in a given time frame, or applied as an additional filter to an existing query.
Regarding your "low cardinality category", I'm not sure what you mean, but if it's a number you could use an IntPoint
.
Upvotes: 1