w00ngy
w00ngy

Reputation: 1828

ElasticSearch NEST combining AND with OR queries

Problem

How do you write NEST code to generate an elastic search query for this simple boolean logic?

term1 && (term2 || term3 || term4)

Pseudo code on my implementation of this logic using Nest (5.2) statement to query ElasticSearch (5.2)

// additional requirements
( truckOemName = "HYSTER" && truckModelName = "S40FT" && partCategoryCode = "RECO" && partID != "")

//Section I can't get working correctly
AND (
    ( SerialRangeInclusiveFrom <= "F187V-6785D" AND SerialRangeInclusiveTo >= "F187V-6060D" )
    OR 
    ( SerialRangeInclusiveFrom = "" || SerialRangeInclusiveTo = "" )
)

Interpretation of Related Documentation

The "Combining queries with || or should clauses" in Writing Bool Queries mentions

The bool query does not quite follow the same boolean logic you expect from a programming language. term1 && (term2 || term3 || term4) does not become

bool
|___must
|   |___term1
|
|___should
   |___term2
   |___term3
   |___term4

you could get back results that only contain term1

which is exactly what I think is happening.

But their answer to solve this is above my understanding of how to apply it with Nest. The answer is either?

  1. Add parentheses to force evaluation order (i am)
  2. Use boost factor? (what?)

Code

Here's the NEST code

 var searchDescriptor = new SearchDescriptor<ElasticPart>();
 var terms = new List<Func<QueryContainerDescriptor<ElasticPart>, QueryContainer>>
 {
     s =>
         (s.TermRange(r => r.Field(f => f.SerialRangeInclusiveFrom)
              .LessThanOrEquals(dataSearchParameters.SerialRangeEnd))
          &&
          s.TermRange(r => r.Field(f => f.SerialRangeInclusiveTo)
              .GreaterThanOrEquals(dataSearchParameters.SerialRangeStart)))
         //None of the data that matches these ORs returns with the query this code generates, below.
         ||
         (!s.Exists(exists => exists.Field(f => f.SerialRangeInclusiveFrom))
          ||
          !s.Exists(exists => exists.Field(f => f.SerialRangeInclusiveTo))
         )
 };

 //Terms is the piece in question
 searchDescriptor.Query(s => s.Bool(bq => bq.Filter(terms))
     && !s.Terms(term => term.Field(x => x.OemID)
         .Terms(RulesHelper.GetOemExclusionList(exclusions))));

 searchDescriptor.Aggregations(a => a
     .Terms(aggPartInformation, t => t.Script(s => s.Inline(script)).Size(50000))
 );
 searchDescriptor.Type(string.Empty);
 searchDescriptor.Size(0);

 var searchResponse = ElasticClient.Search<ElasticPart>(searchDescriptor);

Here's the ES JSON query it generates

{
   "query":{
      "bool":{
         "must":[
            {
               "term":{ "truckOemName": { "value":"HYSTER" }}
            },
            {
               "term":{ "truckModelName": { "value":"S40FT" }}
            },
            {
               "term":{ "partCategoryCode": { "value":"RECO" }}
            },
            {
               "bool":{
                  "should":[
                     {
                        "bool":{
                           "must":[
                              {
                                 "range":{ "serialRangeInclusiveFrom": { "lte":"F187V-6785D" }}
                              },
                              {
                                 "range":{ "serialRangeInclusiveTo": { "gte":"F187V-6060D" }}
                              }
                           ]
                        }
                     },
                     {
                        "bool":{
                           "must_not":[
                              {
                                 "exists":{ "field":"serialRangeInclusiveFrom" }
                              }
                           ]
                        }
                     },
                     {
                        "bool":{
                           "must_not":[
                              {
                                 "exists":{ "field":"serialRangeInclusiveTo" }
                              }
                           ]
                        }
                     }
                  ]
               }
            },
            {
               "exists":{
                  "field":"partID"
               }
            }
         ]
      }
   }
}

Here's the query we'd like it to generate that seems to work.

{
  "query": {
    "bool": {
      "must": [
        {
          "bool": {
            "must": [
              {
                "term": { "truckOemName": { "value": "HYSTER" }}
              },
              {
                "term": {"truckModelName": { "value": "S40FT" }}
              },
              {
                "term": {"partCategoryCode": { "value": "RECO" }}
              },
              {
                "exists": { "field": "partID" }
              }
            ],
            "should": [
              {
                "bool": {
                  "must": [
                    {
                      "range": { "serialRangeInclusiveFrom": {"lte": "F187V-6785D"}}
                    },
                    {
                      "range": {"serialRangeInclusiveTo": {"gte": "F187V-6060D"}}
                    }
                  ]
                }
              },
              {
                "bool": {
                  "must_not": [
                    {
                      "exists": {"field": "serialRangeInclusiveFrom"}
                    },
                    {
                      "exists": {  "field": "serialRangeInclusiveTo"}
                    }
                  ]
                }
              }
            ]
          }
        }
      ]
    }
  }
}

Documentation

Upvotes: 4

Views: 4648

Answers (1)

Russ Cam
Russ Cam

Reputation: 125488

With overloaded operators for bool queries, it is not possible to express a must clause combined with a should clause i.e.

term1 && (term2 || term3 || term4)

becomes

bool
|___must
   |___term1
   |___bool
       |___should
           |___term2
           |___term3
           |___term4

which is a bool query with two must clauses where the second must clause is a bool query where there has to be a match for at least one of the should clauses. NEST combines the queries like this because it matches the expectation for boolean logic within .NET.

If it did become

bool
|___must
|   |___term1
|
|___should
   |___term2
   |___term3
   |___term4

a document is considered a match if it satisfies only the must clause. The should clauses in this case act as a boost i.e. if a document matches one or more of the should clauses in addition to the must clause, then it will have a higher relevancy score, assuming that term2, term3 and term4 are queries that calculate a relevancy score.

On this basis, the query that you would like to generate expresses that for a document to be considered a match, it must match all of the 4 queries in the must clause

"must": [
  {
    "term": { "truckOemName": { "value": "HYSTER" }}
  },
  {
    "term": {"truckModelName": { "value": "S40FT" }}
  },
  {
    "term": {"partCategoryCode": { "value": "RECO" }}
  },
  {
    "exists": { "field": "partID" }
  }
],

then, for documents matching the must clauses, if

  1. it has a serialRangeInclusiveFrom less than or equal to "F187V-6785D" and a serialRangeInclusiveFrom greater than or equal to "F187V-6060D"

    or

  2. serialRangeInclusiveFrom and serialRangeInclusiveTo

then boost that documents relevancy score. The crucial point is that

If a document matches the must clauses but does not match any of the should clauses, it will still be a match for the query (but have a lower relevancy score).

If that is the intent, this query can be constructed using the longer form of the Bool query

Upvotes: 2

Related Questions