Reputation: 13706
I want to set up a search in Lucene (actually Lucene.NET, but I can convert from Java as necessary) using the following logic:
(field1:A field1:B field1:C)
)(+(field1:A) +(field2:B field2:C))
)Currently, my code can test whether a given search produces NO results, and ANDs together all the ones that do produce results. But I have no way to stop it before it tests against every field (which unnecessarily limits the results) - it's currently ending up with a query like: (+(field1:A field1:B field1:C) +(field3:A field3:B field3:C))
when I want it to be (+(field1:A field1:C) +(field3:B))
. I can't just look at the results from the first search and remove words from the search string because the Analyzer mangles the words when it parses it for search, and I have no way to un-mangle them to figure out which of the original search terms it corresponds to.
Any suggestions?
Edit: Ok, generally I prefer describing my problems in the abstract, but I think some part of it is getting lost in the process, so I'll be more concrete.
I'm building a search engine for an site which needs to have several layers of search logic. A few example searches which I'll trace out are:
The index contains documents with seven fields - the relevant ones to this example are:
In general, the logic for each step of the search is as follows:
Each step is something like:
So here's how those three example searches should play out:
+path:headphones +datatype:Category
path:headphones
and datatype:Category
, leaving "Monster" unmatched+path:headphones +brand:monster
path:headphones
and brand:monster
, and no words from the original query are left, so we return all the headphones by Monster.+(path:monster path:headphones path:white) +datatype:Category
path:headphones
, and datatype:Category
, leaving "White" and "Monster" unmatched+path:headphones +(brand:monster +brand:white)
path:headphones
and brand:monster
, leaving "White" unmatched+path:headphones +brand:monster +keywords:white
+(path:foobar path:headphones path:white) +datatype:Category
path:headphones
, and datatype:Category
, leaving "White" and "Foobar" unmatched+path:headphones +(brand:foobar +brand:white)
+path:headphones +(keywords:white keywords:foobar)
path:headphones
and keywords:white
, leaving "Foobar" unmatchedThe problem I have is twofold:
+path:headphones +(brand:headphones brand:monster)
).+path:headphon +datatype:Taxonomy
because I'm mangling it for searching. So I can't take the matched term and just remove that from the original query (because "headphon" != "headphones").Hopefully that makes it clearer what I'm looking for.
Upvotes: 0
Views: 405
Reputation: 13706
In the end, I built a QueryTree class and stored the queries in a tree structure. It stores a reference to a function that takes a query, a list of terms to pump into that query, whether it should AND or OR those terms, and a list of children (which represent unique combinations of matching terms).
To perform the next level of searching, I just call Evaluate(Func<string, QueryParser.Operator, Query> newQuery)
on the deepest nodes in my tree, with a reference to a function which takes terms and an operator and returns the correct Query for that set of logic. The Evaluate
function then tests that new query against the list of unmatched terms that have been passed down to it and the result sets of all ancestral Querys (by ANDing with the parent, which ANDs with it's parent and so on). It then creates children for each set of matching terms, using GetHitTerms, and gives the unmatched terms to the child. Repeat for each level of search.
I suspect that there's a better way to do this - I didn't even look into Bobo that Xodarap mentioned, and I never really got faceted searching (as per denis) working. However, it's working, which means it's time to move on to other aspects of the site.
Upvotes: 0
Reputation: 11849
I don't understand your use case, but you sound like you're asking about the BooleanQuery API. You can get the clauses of your query by calling getClauses
.
A simple example:
BooleanQuery bq = new BooleanQuery();
bq.add(new TermQuery(new Term("field1","a")), BooleanClause.Occur.SHOULD)
bq.add(new TermQuery(new Term("field1","b")), BooleanClause.Occur.SHOULD)
BooleanClause[] clauses = bq.getClauses();
EDIT: maybe you're just asking for a search algorithm. In pseudocode:
generate_query (qs_that_matched, qs_that_didnt_match, level):
new_query = qs_that_matched AND level:qs_that_didnt_match
qs_still_unmatched = ...
qs_which_just_matched = ...
if qs_still_unmatched != null:
return generate_query(qs_that_matched AND qs_which_just_matched, qs_still_unmatched, level+1)
else:
return qs_that_matched AND qs_which_just_matched
Upvotes: 1