Reputation: 53
I use Lucene 6.1.0 to index elements with a name and a value.
E.g.
<documents>
<Document>
<field name="NAME" value="Long_-1"/>
<field name="VALUE" value="-1"/>
</Document>
<Document>
<field name="NAME" value="Double_-1.0"/>
<field name="VALUE" value="-1.0"/>
</Document>
<Document>
<field name="NAME" value="Double_-0.5"/>
<field name="VALUE" value="-0.5"/>
</Document>
<Document>
<field name="NAME" value="Long_0"/>
<field name="VALUE" value="0"/>
</Document>
<Document>
<field name="NAME" value="Double_0.0"/>
<field name="VALUE" value="0.0"/>
</Document>
<Document>
<field name="NAME" value="Double_0.5"/>
<field name="VALUE" value="0.5"/>
</Document>
<Document>
<field name="NAME" value="Long_1"/>
<field name="VALUE" value="1"/>
</Document>
<Document>
<field name="NAME" value="Double_1.0"/>
<field name="VALUE" value="1.0"/>
</Document>
<Document>
<field name="NAME" value="Double_1.5"/>
<field name="VALUE" value="1.5"/>
</Document>
<Document>
<field name="NAME" value="Long_2"/>
<field name="VALUE" value="2"/>
</Document>
</documents>
According to the documentation I use the LongPoint and DoublePoint to build the index.
public static void addLongField(String name, long value, Document doc) {
doc.add(new LongPoint(name, value));
// since Lucene6.x a second field is required to store the values.
doc.add(new StoredField(name, value));
}
public static void addDoubleField(String name, double value, Document doc) {
doc.add(new DoublePoint(name, value));
// since Lucene6.x a second field is required to store the values.
doc.add(new StoredField(name, value));
}
Since I use the same field for long and double values, I get strange results for my RangeQuery if the min and max value have different signs.
LongPoint.newRangeQuery(field, minValue, maxValue);
DoublePoint.newRangeQuery(field, minValue, maxValue);
This example is correct:
VALUE:[1 TO 1] VALUE:[0.5 TO 1.0]
Results in:
0.5 Double_0.5
1 Long_1
1.0 Double_1.0
This example is erroneous
VALUE:[0 TO 1] VALUE:[-0.5 TO 1.0]
Results in:
0 Long_0
0.0 Double_0.0
1 Long_1
-1 Long_-1
-0.5 Double_-0.5
0.5 Double_0.5
1.0 Double_1.0
2 Long_2
Additionally to the correct results, all long values are returned.
Does anybody know why?
Is it not possible to store long and double values in the same field?
Thank you very much.
BR Tobias
Upvotes: 3
Views: 737
Reputation: 33351
No, you should not be keeping different data types in the same field. You should either put them in separate fields, or convert your longs into doubles (or vice versa), so that they are all indexed in the same format.
To understand what is going on, it helps to understand what the numeric fields are really doing. Numeric fields are encoded in a binary representation that facilitates range searching for that type. The encoding for integral types and that for floating point types is not comparable. For an example, for the number 1:
These BytesRef binary representations are what is actually being searched. Since one part of your query is from double -0.5 to 1.0, you are effectively running a query:
Which doesn't include just a few extra hits out of the range of long values, but most of the long values outside of the really high and low reaches (you'd need to be getting into the neighborhood of Long.MAX_VALUE/2
).
Upvotes: 3