Reputation: 13
I've a set of documents annotated with hierarchial taxonomy tags, E.g.
[
{
"id": 1,
"title": "a funny book",
"authors": ["Jean Bon", "Alex Terieur"],
"book_category": "/novel/comedy/new"
},
{
"id": 2,
"title": "a dramatic book",
"authors": ["Alex Terieur"],
"book_category": "/novel/drama"
},
{
"id": 3,
"title": "A hilarious book",
"authors": ["Marc Assin", "Harry Covert"],
"book_category": "/novel/comedy"
},
{
"id": 4,
"title": "A sad story",
"authors": ["Gerard Menvusa", "Alex Terieur"],
"book_category": "/novel/drama"
},
{
"id": 5,
"title": "A very sad story",
"authors": ["Gerard Menvusa", "Alain Terieur"],
"book_category": "/novel"
}]
I need to search book by "book_category". The search must return books that match the query category exactly or partially (with a defined depth threshold) and give them a different score in function of the match degree.
E.g.: query "book_category=/novel/comedy" and "depth_threshold=1" must return books with book_category=/novel/comedy (score=100%), /novel and /novel/comedy/new (score < 100%).
I tried the TopScoreDocCollector in the search, but it returns the book which book_category at least contains the query category, and gives them the same score.
How can i obtain this search function that returns also the more general category and gives different match scores to the results?
P.S.: i don't need a faced search.
Thanks
Upvotes: 1
Views: 547
Reputation: 13
This could by a solution. But i have more than one hierarchic filed to query and i want to use the CategoryPath indexed in taxonomy. I'm using the DrillDown query:
DrillDownQuery luceneQuery = new DrillDownQuery(searchParams.indexingParams);
luceneQuery.add(new CategoryPath("book_category/novel/comedy,'/'));
luceneQuery.add(new CategoryPath("subject/sub1/sub2",'/'));
In this way the search return the books how match the two category paths and their descendants. To retrieve also the ancestors i can start the drilldown from a ancestor of the requested categoryPath (retrieved from the taxonomy).
The problem is the same score for all the results. I want to override the similarity/score function in order to calculate a categoryPath lenght based score, comparing the query categoryPath with each returned document CategoryPath (book_category).
E.g.:
if(queryCategoryPath.compareTo(bookCategoryPath)==0){
document.score = 1
}else if(queryCategoryPath.compareTo(bookCategoryPath)==1){
document.score = 0.9
}else if(queryCategoryPath.compareTo(bookCategoryPath)==2){
document.score = 0.8
} and so on.
Upvotes: 0
Reputation: 5974
There is no built-in query, that supports this reuqirement, but you can use a DisjunctionMaxQuery
with multiple ConstantScoreQuery
s. The exact category and the more general category can be searched by simple TermQuery
s. For the sub-categories, you can use a MultiTermQuery
like the RegexpQuery
to match all sub-categories, if you don't know them upfront. For example:
// the exact category
Query directQuery = new TermQuery(new Term("book_category", "/novel/comedy"));
// regex, that matches one level more that your exact category
Query narrowerQuery = new RegexpQuery(new Term("book_category", "/novel/comedy/[^/]+"));
// the more general category
Query broaderQuery = new TermQuery(new Term("book_category", "/novel"));
directQuery = new ConstantScoreQuery(directQuery);
narrowerQuery = new ConstantScoreQuery(narrowerQuery);
broaderQuery = new ConstantScoreQuery(broaderQuery);
// 100% for the exact category
directQuery.setBoost(1.0F);
// 80% for the more specific category
narrowerQuery.setBoost(0.8F);
// 50% for the more general category
broaderQuery.setBoost(0.5F);
DisjunctionMaxQuery query = new DisjunctionMaxQuery(0.0F);
query.add(directQuery);
query.add(narrowerQuery);
query.add(broaderQuery);
This would give a result like:
id=3 title=a hilarious book book_category=/novel/comedy score=1.000000
id=1 title=a funny book book_category=/novel/comedy/new score=0.800000
id=5 title=A very sad story book_category=/novel score=0.500000
For a complete test case, see this gist: https://gist.github.com/knutwalker/7959819
Upvotes: 1