Reputation: 1579
I'm adding a DocValue to a document with
doc.add(new BinaryDocValuesField("foo",new BytesRef("bar")));
To retrieve that value for a specific document with ID docId
, I call
DocValues.getBinary(reader,"foo").get(docId).utf8ToString();
The get
function in BinaryDocValues is supported up to Lucene 6.6, but for Lucene 7.0 and up it does not seem to be available anymore.
So, how do I get the DocValue by document ID in Lucene 7+ (without having to iterate over BinaryDocValues
/ DocIdSetIterator
, and without having to re-get BinaryDocValues
and use advanceExact
every time) ?
Upvotes: 2
Views: 2686
Reputation: 2924
Doc values are Lucene's column-stride field value storage. Doc values were intended to be quite fast for random access at query time for faceting and sorting purposes. The following issue LUCENE-7407 switches access pattern from random-access to an iterator. Because an iterator API is a much more restrictive access pattern than an arbitrary random access API, this change gives Lucene more freedom and power to use aggressive compression and other optimizations:
You can read about this change in the following blogs:
In practice this change causes performance degradation in some cases, for example SOLR-9599. In major case(faceting and sorting) an iterative API is OK with proper usage and, even more, allows to perform some optimizations. In fact there are a lot of cases where this API is not a good solution. All these cases were discarded as an incorrect usage(the same problem we had in java word with sun.misc.Unsafe).
In fact, org.apache.lucene.index.DocValuesIterator#advanceExact
is quite fast and has similar performance and complexity in case of some implementations.
Upvotes: 10