Reputation: 54005
I am trying to implement an index of documents (rougly corresponding to DB rows), where one of the fields is an integer. I'm adding them to index like:
Document doc = new Document();
doc.add(new StringField("ticket_number", rs.getString("ticket_number"),
Field.Store.YES));
doc.add(new IntField("ticket_id", rs.getInt("ticket_id"),
Field.Store.YES));
doc.add(new StringField("id_s", rs.getString("ticket_id"),
Field.Store.YES));
w.addDocument(doc);
It seems I can't query the ticket_id
field at all, while id_s
works just fine.
One of the documents is (I added whitespace for readability):
Document<
stored,indexed,tokenized,omitNorms,indexOptions=DOCS_ONLY<ticket_number:230114W>
stored<ticket_id:152>
stored,indexed,tokenized,omitNorms,indexOptions=DOCS_ONLY<id_s:152>>
So my int field is stored, but not indexed. This query works as expected: id_s:152
, while this one never returns anything: ticket_id:152
.
What am I doing wrong? How can I add such a field to the index and make it searchable?
Upvotes: 17
Views: 15123
Reputation: 143
Another answer comes from this thread (third answer): Lucene 4.0 IndexWriter updateDocument for Numeric Term
Basically, you create a Term with your int value like this:
String field = "myfield";
int value = 4711;
BytesRef bytes = new BytesRef(NumericUtils.BUF_SIZE_INT);
NumericUtils.intToPrefixCoded(value, 0, bytes);
Term term = new Term(field, bytes);
Then you can use this term for searching, or deleting/updating your index. In a first test, this worked fine for me. I can't tell if this is the "right" way to do things however. I've used the NumericRangeFilter before for filtering IntFields, but now I'm inclined to use this approach and use regular TermsFilter, or TermQueries instead.
Upvotes: 9
Reputation: 26703
Below works for me:
RAMDirectory idx = new RAMDirectory();
IndexWriter writer = new IndexWriter(
idx,
new IndexWriterConfig(Version.LUCENE_40, new ClassicAnalyzer(Version.LUCENE_40))
);
Document document = new Document();
document.add(new StringField("ticket_number", "t123", Field.Store.YES));
document.add(new IntField("ticket_id", 234, Field.Store.YES));
document.add(new StringField("id_s", "234", Field.Store.YES));
writer.addDocument(document);
writer.commit();
IndexReader reader = DirectoryReader.open(idx);
IndexSearcher searcher = new IndexSearcher(reader);
Query q1 = new TermQuery(new Term("id_s", "234"));
TopDocs td1 = searcher.search(q1, 1);
System.out.println(td1.totalHits); // prints "1"
Query q2 = NumericRangeQuery.newIntRange("ticket_id", 1, 234, 234, true, true);
TopDocs td2 = searcher.search(q2, 1);
System.out.println(td2.totalHits); // prints "1"
As femtoRgon pointed out, for numeric values (longs, dates, floats, etc.) you need to have NumericRangeQuery
and specify precision. Otherwise Lucene has no idea how do you want to define similarity.
Upvotes: 19
Reputation: 33341
Numeric Fields can be queried with a NumericRangeQuery. For an exact match, simply set the max and min to equal values.
Your output indicating the field is not indexed could be due to the differences in how a numeric value is indexed, compared to a text value. Considering that the field is transformed into Lucene's numeric representation, the literal value 152
will indeed not be indexed
At a glance, however, it's possible that your handling of id_s may be the better alternative. IDs are not usually handled as numeric values, but rather as just simple identifiers that happen to be represented with digits. If you don't need numeric sorting or range querying on the field, indexing as a StringField
certainly makes more sense.
Upvotes: 8