How to get n first terms from field in Lucene 5.x?

Question

I'm using Lucene for an Eclipse plugin. Currently I iterate over my indexed terms like this:

I get a Terms instance using IndexReader.getTermVector(id, field)
I iterate over this instance using TermsEnum like this: while ((text = vectorEnum.next()) != null)

Now what I want additionally is to get the first n elements of a field. I figured I have to use PostingsEnum to accomplish this, but I don't get how to use it. I guess I can get it by calling postings() on my TermsEnum, but I don't know what to do with that.

Edit: That's the important part of my code I guess:

Terms vector = indexReader.getTermVector(id, field);
BytesRef text = null; 
if (vector != null) {
                TermsEnum vectorEnum = vector.iterator();
                while ((text = vectorEnum.next()) != null) {
                    String term = text.utf8ToString();
                    [do stuff]
                }

And that's the FieldType:

FieldType fieldType = new FieldType();
fieldType.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS);
fieldType.setStored(true);
fieldType.setStoreTermVectors(true);
fieldType.setTokenized(true);

neilireson · Accepted Answer

Not sure why but requesting positions using setIndexOptions doesn't seem to work so you have to explicitly set setStoreTermVectorPositions. You still have to set the index options to something other than NONE but it doesn't seem necessary to use DOCS_AND_FREQS_AND_POSITIONS, i.e.

fieldType.setIndexOptions(IndexOptions.DOCS);
fieldType.setStoreTermVectorPositions(true);

then you can access the positions:

Terms vector = indexReader.getTermVector(id, field);
if (vector != null) {
    TermsEnum vectorEnum = vector.iterator();
    BytesRef text;
    while ((text = vectorEnum.next()) != null) {
        String term = text.utf8ToString();
        PostingsEnum postings = vectorEnum.postings(null, PostingsEnum.POSITIONS);
        while (postings.nextDoc() != DocIdSetIterator.NO_MORE_DOCS) {
            int freq = postings.freq();
            while (freq-- > 0)
                logger.info("Position: {}", postings.nextPosition());
        }
    }
}

How to get n first terms from field in Lucene 5.x?

Answers (1)

Related Questions