Konrad Garus
Konrad Garus

Reputation: 54005

How to search an int field in Lucene 4?

I am trying to implement an index of documents (rougly corresponding to DB rows), where one of the fields is an integer. I'm adding them to index like:

Document doc = new Document();
doc.add(new StringField("ticket_number", rs.getString("ticket_number"),
        Field.Store.YES));
doc.add(new IntField("ticket_id", rs.getInt("ticket_id"),
        Field.Store.YES));
doc.add(new StringField("id_s", rs.getString("ticket_id"),
        Field.Store.YES));
w.addDocument(doc);

It seems I can't query the ticket_id field at all, while id_s works just fine.

One of the documents is (I added whitespace for readability):

Document<
    stored,indexed,tokenized,omitNorms,indexOptions=DOCS_ONLY<ticket_number:230114W> 
    stored<ticket_id:152> 
    stored,indexed,tokenized,omitNorms,indexOptions=DOCS_ONLY<id_s:152>>

So my int field is stored, but not indexed. This query works as expected: id_s:152, while this one never returns anything: ticket_id:152.

What am I doing wrong? How can I add such a field to the index and make it searchable?

Upvotes: 17

Views: 15123

Answers (3)

D.Ogranos
D.Ogranos

Reputation: 143

Another answer comes from this thread (third answer): Lucene 4.0 IndexWriter updateDocument for Numeric Term

Basically, you create a Term with your int value like this:

String field = "myfield";
int value = 4711;
BytesRef bytes = new BytesRef(NumericUtils.BUF_SIZE_INT);
NumericUtils.intToPrefixCoded(value, 0, bytes);
Term term = new Term(field, bytes);

Then you can use this term for searching, or deleting/updating your index. In a first test, this worked fine for me. I can't tell if this is the "right" way to do things however. I've used the NumericRangeFilter before for filtering IntFields, but now I'm inclined to use this approach and use regular TermsFilter, or TermQueries instead.

Upvotes: 9

mindas
mindas

Reputation: 26703

Below works for me:

    RAMDirectory idx = new RAMDirectory();
    IndexWriter writer = new IndexWriter(
            idx,
            new IndexWriterConfig(Version.LUCENE_40, new ClassicAnalyzer(Version.LUCENE_40))
    );
    Document document = new Document();
    document.add(new StringField("ticket_number", "t123", Field.Store.YES));
    document.add(new IntField("ticket_id", 234, Field.Store.YES));
    document.add(new StringField("id_s", "234", Field.Store.YES));
    writer.addDocument(document);
    writer.commit();

    IndexReader reader = DirectoryReader.open(idx);
    IndexSearcher searcher = new IndexSearcher(reader);

    Query q1 = new TermQuery(new Term("id_s", "234"));
    TopDocs td1 = searcher.search(q1, 1);
    System.out.println(td1.totalHits);  // prints "1"

    Query q2 = NumericRangeQuery.newIntRange("ticket_id", 1, 234, 234, true, true);
    TopDocs td2 = searcher.search(q2, 1);
    System.out.println(td2.totalHits);  // prints "1"

As femtoRgon pointed out, for numeric values (longs, dates, floats, etc.) you need to have NumericRangeQuery and specify precision. Otherwise Lucene has no idea how do you want to define similarity.

Upvotes: 19

femtoRgon
femtoRgon

Reputation: 33341

Numeric Fields can be queried with a NumericRangeQuery. For an exact match, simply set the max and min to equal values.

Your output indicating the field is not indexed could be due to the differences in how a numeric value is indexed, compared to a text value. Considering that the field is transformed into Lucene's numeric representation, the literal value 152 will indeed not be indexed

At a glance, however, it's possible that your handling of id_s may be the better alternative. IDs are not usually handled as numeric values, but rather as just simple identifiers that happen to be represented with digits. If you don't need numeric sorting or range querying on the field, indexing as a StringField certainly makes more sense.

Upvotes: 8

Related Questions