Reputation: 6005
I'm going to bounty +100 this question when possible, even if it's already answered and accepted
I'm using Lucene 3.2, here's what I have in my index and code:
OR
operand in query phrase (ie: "my lucene search" goes "my OR lucene OR search").MultiFieldQueryParser
with Occur.SHOULD
in all fields.What am I trying to reach? A sort of Google-like search, let me explain:
I'm reaching every aspect but this last one. My problems are the following:
This is my actual call to the query parser:
MultiFieldQueryParser.parse(
Version.LUCENE_31,
OrQueryWords, //query words separated with OR operand
searchFields, //String[] searchFields; // all fields
occurs, //Occur[] occurs; {Occur.SHOULD, Occur.SHOULD, etc..}
getFullTextSession().getSearchFactory().getAnalyzer(Product.class)
);
The toString()
of this query prints something like this:
(field1:"word1 word2" (field1:word1 field1:word2)) (field2:"word1 word2" (...)) etc.
Right now I'm trying to add the default field (the one containing all other fields) with query words separated with AND operand and Occur.MUST
:
MultiFieldQueryParser.parse(
Version.LUCENE_31,
AndQueryWords, //query words separated with AND operand
new String[] {"defaultField"},
new Occur[] {Occur.MUST},
getFullTextSession().getSearchFactory().getAnalyzer(Product.class)
);
The toString()
of this query prints this:
+(default:"word1 word2" (+default:word1 +default:word2))
How can I intersect both queries? Is there any other solution to reach it?
Upvotes: 0
Views: 1002
Reputation: 9964
I am not sure to understand what you exactly want to achieve, so I am going to give you a few hints on how to customize your scoring when dealing with multi-field multi-term queries.
Intersection of two queries
You seem to be happy with you conjuctive query on the default field resultset, and by your disjunctive query on all fields scoring. You can get the best of both worlds by using the latter as your main query and the former as a filter.
For example:
Query mainQuery, filterQuery;
BooleanQuery query = new BooleanQuery();
// add the main query for scoring
query.add(mainQuery, Occur.SHOULD);
// prevent the filter query to participate in the scoring
filter.setBoost(0);
// make the filter query required
query.add(filterQuery, Occur.MUST);
Minimum should match clauses
If AND-ing all clauses is too restrictive, and OR-ing all clauses is not restrictive enough, then you could do something in between by setting the minimum number of SHOULD clauses that must match so that a document appears in the resultset.
Then the difficult part is to find the right formula to compute the minimum number of SHOULD clauses which must match for optimal user experience.
For example, let's say you want the ceil of 3/4 of the SHOULD clauses to match. Starting with a two-clauses query and adding clauses up to 5 clauses would yield the following evolution of the number of results.
Anyway, with this feature, the only way for the number of results to shrink as the number of clauses increases is to have a purely conjunctive query.
Upvotes: 1
Reputation: 66226
The approach I've used for solving a similar problem is based on limiting number of results by score.
Unfortunatelly, Lucene doesn't provide such feature out of the box and they also discourage this approach (http://wiki.apache.org/lucene-java/ScoresAsPercentages). Main concern is based on the fact that score's absolute value is meaningless.
I used score's relative value for filtering: I picked the highest score, then calculated minimal accepted score from it (let's say maxScore / 5
) and left only those results which satisfied this criterion.
Upvotes: 1