Reputation: 97
This is my code to perform a PhraseQuery using Lucene. While it is clear how to get score matches for each document inside the index, i am not understanding how to extract the total number of matches for a single document. The following is my code performing the query:
PhraseQuery.Builder builder = new PhraseQuery.Builder();
builder.add(new Term("contents", "word1"), 0);
builder.add(new Term("contents", "word2"), 1);
builder.add(new Term("contents", "word3"), 2);
builder.setSlop(3);
PhraseQuery pq = builder.build();
int hitsPerPage = 10;
IndexReader reader = DirectoryReader.open(index);
IndexSearcher searcher = new IndexSearcher(reader);
TopDocs docs = searcher.search(pq, hitsPerPage);
ScoreDoc[] hits = docs.scoreDocs;
System.out.println("Found " + hits.length + " hits.");
for(int i=0;i<hits.length;++i)
{
int docId = hits[i].doc;
Document d = searcher.doc(docId);
System.out.println(docId + " " + hits[i].score);
}
Is there a method to extract the total number of matches for each document rather than the score?
Upvotes: 0
Views: 137
Reputation: 141
Approach A. This might not be the best way but it will give you a quick insight. You can use explain()
function of IndexSearcher
class which will return a string containing lots of information and phrase frequency in a document. Add this code inside your for loop:
System.out.println(searcher.explain(pq, searcher.doc(docId)));
Approach B. A more systematic way of doing this is to do the same thing that explain()
function does. To compute the phrase frequency, explain()
builds a scorer
object for the phrase query and calls freq()
on it. Most of the methods/classes used to do this are private/protected so I am not sure if you can really use them. However it might be helpful to look at the code of explain()
in PhraseWeight
class inside PhraseQuery
and ExactPhraseScorer
class. (Some of these classes are not public and you should download the source code to be able to see them).
Upvotes: 1