Saurabh Nanda
Saurabh Nanda

Reputation: 6793

How does "must" clause with an array of "match" clauses really mean?

I have an elasticsearch query which looks like this...

  "query": {
    "bool": {
      "must": [{
        "match": {"attrs.name": "username"}
      }, {
        "match": {"attrs.value": "johndoe"}
      }]
    }
  }

... and documents in the index that look like this:

{
  "key": "value",
  "attrs": [{
    "name": "username",
    "value": "jimihendrix"
  }, {
    "name": "age",
    "value": 23
  }, {
    "name": "alias",
    "value": "johndoe"
  }]
}

Which of the following does this query really mean?

  1. Document should contain either attrs.name = username OR attrs.value = johndoe
  2. Or, document should contain, both, attrs.name = username AND attrs.value = johndoe, even if they may match different elements in the attrs array (this would mean that the document given above would match the query)
  3. Or, document should contain, both, attrs.name = username AND attrs.value = johndoe, but they must match the same element in the attrs array (which would mean that the document given above would not match the query)

Further, how do I write a query to express #3 from the list above, i.e. the document should match only if a single element inside the attrs array matches both the following conditions:

Upvotes: 1

Views: 845

Answers (2)

user11935734
user11935734

Reputation:

Based on your requirements you need to define your attrs field as nested, please refer nested type in Elasticsearch for more information. Disclaimer : it maintains the relationship but costly to query.

Answer to your other two questions also depends on what data type you are using please refer nested vs object data type for more details

Edit: solution using sample mapping, example docs and expected result

Index mapping using nested type

{
    "mappings": {
        "properties": {
            "attrs": {
                "type": "nested"
            }
        }
    }
}

Index 2 sample doc one which severs the criteria and other which doesn't

{
    "attrs": [
        {
            "name": "username",
            "value": "johndoe"
        },
        {
            "name": "alias",
            "value": "myname"
        }
    ]
}

Another which serves criteria

{
    "attrs": [
        {
            "name": "username",
            "value": "jimihendrix"
        },
        {
            "name": "age",
            "value": 23
        },
        {
            "name": "alias",
            "value": "johndoe"
        }
    ]
}

And search query

{
  "query": {
    "nested": {
      "path": "attrs",
      "inner_hits": {}, 
      "query": {
        "bool": {
          "must": [
            {
              "match": {
                "attrs.name": "username"
              }
            },
            {
              "match": {
                "attrs.value": "johndoe"
              }
            }
          ]
        }
      }
    }
  }
}

And Search result

 "hits": [
            {
                "_index": "nested",
                "_type": "_doc",
                "_id": "2",
                "_score": 1.7509375,
                "_source": {
                    "attrs": [
                        {
                            "name": "username",
                            "value": "johndoe"
                        },
                        {
                            "name": "alias",
                            "value": "myname"
                        }
                    ]
                },
                "inner_hits": {
                    "attrs": {
                        "hits": {
                            "total": {
                                "value": 1,
                                "relation": "eq"
                            },
                            "max_score": 1.7509375,
                            "hits": [
                                {
                                    "_index": "nested",
                                    "_type": "_doc",
                                    "_id": "2",
                                    "_nested": {
                                        "field": "attrs",
                                        "offset": 0
                                    },
                                    "_score": 1.7509375,
                                    "_source": {
                                        "name": "username",
                                        "value": "johndoe"
                                    }
                                }
                            ]
                        }
                    }
                }
            }
        ]

Upvotes: 0

jaspreet chahal
jaspreet chahal

Reputation: 9099

Must stands for "And" so a document satisfying all the clauses in match query is returned.

Must will not satisfy point 1. Document should contain either attrs.name = username OR attrs.value = johndoe- you need a should clause which works like "OR"

Whether Must will satisfy Point 2 or point 3 depends on the type of "attrs" field.

If "attr" field type is object then fields are flattened that is no relationship maintained between different fields for array. So must query will return a document if any attrs.name="username" and attrs.value="John doe", even if they are not part of same object in that array.

If you want an object in an array to act like a separate document, you need to use nested field and use nested query to match documents

{
  "query": {
    "nested": {
      "path": "attrs",
      "inner_hits": {}, --> returns matched nested documents
      "query": {
        "bool": {
          "must": [
            {
              "match": {
                "attrs.name": "username"
              }
            },
            {
              "match": {
                "attrs.value": "johndoe"
              }
            }
          ]
        }
      }
    }
  }
}

hits in the response will contain all nested documents , to get all matched nested documents , inner_hits has to be specified

Upvotes: 3

Related Questions