Michał Mazur
Michał Mazur

Reputation: 125

Query elastic search documents, where there is null value for property in nested object

I am struggling with elastic search query.

These are example documents, which I would to query. These are documents with generic properties

[
    {
        "field1": "value",
        "properties": [
            {
                "propertyBooleanValue": null,
                "propertyName": "Product name",
                "propertyDateValue": null,
                "propertyType": "TEXT",
                "propertyStringValue": "SUPER Cool Extreme",
                "propertyNumericValue": null
            },
            {
                "propertyBooleanValue": null,
                "propertyName": "Product expiration date",
                "propertyDateValue": null,
                "propertyType": "DATE",
                "languageCode": null,
                "propertyNumericValue": null
            }
        ]
    },
    {
        "field1": "blah blah",
        "properties": [
            {
                "propertyBooleanValue": null,
                "propertyName": "Product name",
                "propertyDateValue": null,
                "propertyType": "TEXT",
                "propertyStringValue": "So boring",
                "propertyNumericValue": null
            },
            {
                "propertyBooleanValue": null,
                "propertyName": "Product expiration date",
                "propertyDateValue": "2020-04-02",
                "propertyType": "DATE",
                "languageCode": null,
                "propertyNumericValue": null
            }
        ]
    },
    {
        "field1": "wow2",
        "properties": [
            {
                "propertyBooleanValue": null,
                "propertyName": "Product name",
                "propertyDateValue": null,
                "propertyType": "TEXT",
                "propertyStringValue": "iPear",
                "propertyNumericValue": null
            },
            {
                "propertyBooleanValue": null,
                "propertyName": "Product expiration date",
                "propertyDateValue": null,
                "propertyType": "DATE",
                "languageCode": null,
                "propertyNumericValue": null
            }
        ]
    }
]

I would like to query only documents with nested object, which has property with "propertyName"= "Product expiration date" and "propertyDateValue" = null

I use query, but it returns all documents:

{
  "query": {
    "bool": {
      "must": [
        {
          "nested": {
            "query": {
              "bool": {
                "must": [
                  {
                    "bool": {
                      "must_not": [
                        {
                          "exists": {
                            "field": "properties.propertyDateValue"
                          }
                        }
                      ]
                    }
                  },
                  {
                    "term": {
                      "properties.propertyName": {
                        "value": "Product expiration date"
                      }
                    }
                  }
                ]
              }
            },
            "path": "properties"
          }
        }
      ]
    }
  }
}

We use elastic search 7.7

Upvotes: 1

Views: 1372

Answers (1)

Joe - Check out my books
Joe - Check out my books

Reputation: 16925

As @jaspreet mentioned, the result is expected. To elaborate on it further, you can use the inner_hits parameter to retrieve only those properties' nested subdocuments that actually matched both queries, i.e.:

{
  "_source": "inner_hits",        <---- hiding the default response
  "query": {
    "bool": {
      "must": [
        {
          "nested": {
            "query": {
              "bool": {
                "must": [
                  {
                    "bool": {
                      "must_not": [
                        {
                          "exists": {
                            "field": "properties.propertyDateValue"
                          }
                        }
                      ]
                    }
                  },
                  {
                    "term": {
                      "properties.propertyName": {
                        "value": "Product expiration date"
                      }
                    }
                  }
                ]
              }
            },
            "path": "properties",
            "inner_hits": {}         <----- needs to be here
          }
        }
      ]
    }
  }
}

yielding

[
      {
        "_index" : "mich",
        "_type" : "_doc",
        "_id" : "6iLSVHEBZbobBB0NSl9x",
        "_score" : 0.6931472,
        "_source" : { },
        "inner_hits" : {
          "properties" : {
            "hits" : {
              "total" : {
                "value" : 1,
                "relation" : "eq"
              },
              "max_score" : 0.6931472,
              "hits" : [
                {
                  "_index" : "mich",
                  "_type" : "_doc",
                  "_id" : "6iLSVHEBZbobBB0NSl9x",
                  "_nested" : {
                    "field" : "properties",
                    "offset" : 1
                  },
                  "_score" : 0.6931472,
                  "_source" : {
                    "propertyBooleanValue" : null,
                    "propertyName" : "Product expiration date",
                    "propertyDateValue" : null,
                    "propertyType" : "DATE",
                    "languageCode" : null,
                    "propertyNumericValue" : null
                  }
                }
              ]
            }
          }
        }
      },
      ...
    ]

which was probably what you were looking for.


Keep in mind that the above query is different from the following where you have two separate bool-must clauses which disregard the AND connection compared to the first query. In this case, inner_hits will need to have a unique name.

{
  "_source": "inner_hits", 
  "query": {
    "bool": {
      "must": [
        {
          "nested": {
            "path": "properties",
            "query": {
              "bool": {
                "must": [
                  {
                    "bool": {
                      "must_not": [
                        {
                          "exists": {
                            "field": "properties.propertyDateValue"
                          }
                        }
                      ]
                    }
                  }
                ]
              }
            },
            "inner_hits": {
              "name": "NULL_propertyDateValue"
            }
          }
        },
        {
          "nested": {
            "path": "properties",
            "query": {
              "bool": {
                "must": [
                  {
                    "term": {
                      "properties.propertyName": {
                        "value": "Product expiration date"
                      }
                    }
                  }
                ]
              }
            },
            "inner_hits": {
                "name": "MATCH_IN_propertyName"
            }
          }
        }
      ]
    }
  }
}

Long story short, go w/ the first query and feel free to limit the returned response using inner_hits.

Upvotes: 1

Related Questions