Cale
Cale

Reputation: 371

Searching objects having all nested children matching a given query in Elasticsearch

Given an object with the following mapping:

{
    "a": {
        "properties": {
            "id": {"type": "string"}
            "b": {
                "type": "nested",
                "properties": {
                    "key": {"type": "string"}
                }
            }
        }
    }
}

I want to retrieve all the instances of this object having all nested children matching a given query.

For example, suppose I want to retrieve all the instances having all children with "key" = "yes". Given the following instances:

{
    "id": "1",
    "b": [
        {
            "key": "yes"
        },
        {
            "key": "yes"
        }
    ] 
},
{
    "id": "2",
    "b": [
        {
            "key": "yes"
        },
        {
            "key": "yes"
        },
        {
            "key": "no"
        }
    ] 
},

I want to retrieve only the first one (the one with "id" = "1").

Both using filters or queries is fine to me. I already tried to use the "not filter" and the "must_not bool filter". The idea was to use a double negation to extract only objects that doesn't have fields that are different to the given one. However, I was not able to write down this query correctly.

I realize that this is not a common query for a search engine, but, in my case, it can be useful.

Is it possible to write this query ("forall nested query") using nested objects? In case it is not, would it be possible to write this query using parent-child?

Update

Andrei Stefan gave a good answer in case we know all the values of "key" that we want to avoid, ("no", in the example).

I am interested also in the case you don't know the values you want to avoid, and you just want to match nested object with "key"="yes".

Upvotes: 2

Views: 1651

Answers (2)

DruidKuma
DruidKuma

Reputation: 2480

Encountered the same problem, though didn't have just yes/no variants. As per Clinton Gormley's answer in https://github.com/elastic/elasticsearch/issues/19166: "You can't do it any efficient way. You have to count all children and compare that to how many children match. The following will return all parents where all children match but it is a horrible inefficient solution and I would never recommend using it in practice":

{
  "query": {
    "bool": {
      "must": [
        {
          "nested": {
            "path": "b",
            "score_mode": "sum",
            "query": {
              "function_score": {
                "query": {
                  "match_all": {}
                },
                "functions": [
                  {
                    "weight": -1
                  },
                  {
                    "filter": {
                      "match": {
                        "b.key": "yes"
                      }
                    },
                    "weight": 1
                  }
                ],
                "score_mode": "sum",
                "boost_mode": "replace"
              }
            }
          }
        }
      ]
    }
  }
}

Upvotes: 0

Andrei Stefan
Andrei Stefan

Reputation: 52368

You need a flattened data structure for this - an array of values. The simplest way and not to change the current mapping too much, is to use include_in_parent property and to query the field that's being included in the parent for this particular requirement:

{
  "mappings": {
    "a": {
      "properties": {
        "id": {
          "type": "string"
        },
        "b": {
          "type": "nested",
          "include_in_parent": true,
          "properties": {
            "key": {
              "type": "string"
            }
          }
        }
      }
    }
  }
}

And then your query would look like this:

{
  "query": {
    "filtered": {
      "filter": {
        "and": [
          {
            "query": {
              "query_string": { "query": "b.key:(yes NOT no)"}
            }
          }
        ]
      }
    }
  }
}

The alternative is to change the type of the field from nested to object but in this way you'll loose the advantages of using nested fields:

{
  "mappings": {
    "a": {
      "properties": {
        "id": {
          "type": "string"
        },
        "b": {
          "type": "object",
          "properties": {
            "key": {
              "type": "string"
            }
          }
        }
      }
    }
  }
}

The query remains the same.

Upvotes: 2

Related Questions