Filtering term count in Lucene (Java)

Question

I'm currently trying to get the amount of appearences of each word in a description field using Lucene. F.e.

description: BOX OF APPLES
description: BOX OF BANANAS

output:

BOX 2
OF 2
APPLES 1
BANANAS 1

I am looking to get the word and the frequency.

The thing is I would like to filter those results to a given document, I mean only count the words in the description field of a given document.

Thanks for any assistance given.

//in answer to comment: I have something like this:

public ArrayList GetIndexTerms(String code) {
        try {

            ArrayList

jpountz · Accepted Answer

The problem is that Lucene is an inverted index, meaning that it makes it easy to retrieve documents based on terms, whereas you are looking for the opposite, i.e. retrieveing terms based on documents.

Hopefully, this is a recurrent problem and Lucene gives you the ability to retrieve terms for a document (term vectors) provided that you enabled this feature at indexing time.

See TermVector.YES and Field constructor to know how to enable them at indexing time and IndexReader to know how to retrieve term vectors at search time.

Alternatively, you could re-analyze a stored field on the fly, but this may be slower, especially on large fields.

Filtering term count in Lucene (Java)

Answers (1)

Related Questions