Boris Callens
Boris Callens

Reputation: 93357

Get query terms from Lucene query for highlighting

My Lucene queries will usually exist of a bunch of AND combined fields. Is it possible to get the queried fields out of the Query object again?

Upvotes: 4

Views: 2658

Answers (1)

Tomer Gabel
Tomer Gabel

Reputation: 4112

Did you mean extracting the terms or the field names? Since you already know you're handling a BooleanQuery, to extract the fields you can simply iterate the BooleanClause array returned by BooleanQuery.getClauses(), rewrite each clause to its base query (Query.rewrite) and apply recursively until you have a TermQuery on your hands.

If you did mean term extraction, I'm not sure about Lucene.NET, but in Java Lucene you can use org.apache.lucene.search.highlight.QueryTermExtractor; you pass a (rewritten) query to one of its getTerms overloads and get an array of WeightedTerms.

As far as I remember, the downsides to using this technique are:

  • Since it internally uses a term set it won't handle multiple instances of the same token, e.g. "dream within a dream"
  • It only supports base query types (TermQuery, BooleanQuery and any other query type which supports Query.extractTerms). I believe we've used it internally for SpanNearQuery and SpanNearOrderedQuery instances, but I may be wrong on this.

Either way I hope this is enough to get you started.

Upvotes: 2

Related Questions