Schaka
Schaka

Reputation: 772

How do I force the combination of two queries to be MUST in Lucene?

I am using a mix of Hibernate Search and Apache Lucene. What I am doing should be fairly straightfoward and easy, yet I am unable to achieve my goal.

I have a list of strings (phrases) that I want to query a field for. The field can contain either of these strings. Between each field, only one of them has to be a match at all.

In MySQL, it would look like this

select * from movies where (genres = 'name' or genres = 'name2') OR (actors = 'name' or actors = 'name2)' AND (actors = 'name' or actors = 'name2)

So if a movie contained at least 1 genre given and 1 actor given or 2 actors, the condition would be fulfilled. Now in Lucene I first build a BooleanQuery combining all possible actors with Occur.SHOULD. Then I build another BooleanQuery combining the previous BooleanQuery with another (which contains all genres, for example).

In the end, I do the same twice and add both these BooleanQueries to a new one, both with Occur.MUST. However, I am receiving results where only one of my conditions if fulfilled not at least 2. How do I go about solving this?

private BooleanQuery getMatchQuery(List<String> list, String field) {
        BooleanQuery bq = new BooleanQuery();
        QueryBuilder qb = getFullTextEntityManager().getSearchFactory().buildQueryBuilder().forEntity(Movie.class).get();
        for (String string : list) {
            bq.add(qb.phrase().onField(field).sentence(string).createQuery(), Occur.SHOULD);
        }
        return bq;
    }

private BooleanQuery getParamMatches(MovieDto dto, boolean genres){
        BooleanQuery bq = new BooleanQuery();
        bq.add(getMatchQuery(dto.getActors(), "actors"), Occur.SHOULD);
        bq.add(getMatchQuery(dto.getDirectors(), "directors"), Occur.SHOULD);
        bq.add(getMatchQuery(dto.getWriters(), "writers"), Occur.SHOULD);
        if(genres){
            bq.add(getMatchQuery(dto.getGenres(), "genres"), Occur.SHOULD);
        }
        return bq;

    }
public List<Movie> test(MovieDto dto){
        QueryBuilder qb = getFullTextEntityManager().getSearchFactory().buildQueryBuilder().forEntity(Movie.class).get();
        log.info(getMatches(dto.getActors()));
        BooleanQuery bq = new BooleanQuery();
        bq.add(getParamMatches(dto, true), Occur.MUST);
        bq.add(getParamMatches(dto, false), Occur.MUST);
        javax.persistence.Query query =  getFullTextEntityManager().createFullTextQuery(bq, Movie.class);
        List<Movie> result = query.getResultList();
        return result;
    }

This is the order in which I am doing this as described above. Calls are done from bottom to top though. The result query is this one:

+((actors:"marlon brando" actors:"al pacino" actors:"james caan" actors:"richard s castellano")
 (directors:"francis ford coppola") (writers:"mario puzo screenplay" writers:"francis ford coppola screenplay" writers:"mario puzo novel")
 (genres:crime genres:drama)) 
+((actors:"marlon brando" actors:"al pacino" actors:"james caan" actors:"richard s castellano")
 (directors:"francis ford coppola") (writers:"mario puzo screenplay" writers:"francis ford coppola screenplay" writers:"mario puzo novel"))

So, how do I go about making both conditions mandatory in combination, so that I won't receive results in which only one actor, director etc is present? I want at least 2 parameters to match, one out of each query.

Upvotes: 0

Views: 432

Answers (1)

femtoRgon
femtoRgon

Reputation: 33341

Your comment is correct, both of your subqueries can (and in all results of the given query, certainly will) both match the same term.

There is an easier way to make sure you have at least two matched subqueries in a boolean query, rather than creating a list of all possible combination or something like that, though. BooleanQuery.setMinimumNumberShouldMatch. So:

BooleanQuery query = getParamMatches(dto, true);
query.setMinimumShouldMatch(2);

Would have to have a match in at least two of your fields. If you wanted to get a hit on any two terms matched, regardless of whether they are in different fields or not, you'dd want to add them all to the same BooleanQuery. That would probably mean modifying getMatchQuery at accept the BooleanQuery as an argument, and just add to it, instead of creating a new one.

Upvotes: 1

Related Questions