Alex Torrisi
Alex Torrisi

Reputation: 97

Number of PhraseQuery matches in a document

This is my code to perform a PhraseQuery using Lucene. While it is clear how to get score matches for each document inside the index, i am not understanding how to extract the total number of matches for a single document. The following is my code performing the query:

        PhraseQuery.Builder builder = new PhraseQuery.Builder();

        builder.add(new Term("contents", "word1"), 0);
        builder.add(new Term("contents", "word2"), 1);
        builder.add(new Term("contents", "word3"), 2);
        builder.setSlop(3);
        PhraseQuery pq = builder.build();

        int hitsPerPage = 10;
        IndexReader reader = DirectoryReader.open(index);
        IndexSearcher searcher = new IndexSearcher(reader);

        TopDocs docs = searcher.search(pq, hitsPerPage);

        ScoreDoc[] hits = docs.scoreDocs;

        System.out.println("Found " + hits.length + " hits.");

        for(int i=0;i<hits.length;++i)
        {
            int docId = hits[i].doc;
            Document d = searcher.doc(docId);
            System.out.println(docId + " " + hits[i].score);
        }

Is there a method to extract the total number of matches for each document rather than the score?

Upvotes: 0

Views: 137

Answers (1)

vahid
vahid

Reputation: 141

Approach A. This might not be the best way but it will give you a quick insight. You can use explain() function of IndexSearcher class which will return a string containing lots of information and phrase frequency in a document. Add this code inside your for loop:

System.out.println(searcher.explain(pq, searcher.doc(docId)));

Approach B. A more systematic way of doing this is to do the same thing that explain() function does. To compute the phrase frequency, explain() builds a scorer object for the phrase query and calls freq() on it. Most of the methods/classes used to do this are private/protected so I am not sure if you can really use them. However it might be helpful to look at the code of explain() in PhraseWeight class inside PhraseQuery and ExactPhraseScorer class. (Some of these classes are not public and you should download the source code to be able to see them).

Upvotes: 1

Related Questions