Alex
Alex

Reputation: 1947

ElasticSearch array data match multiple properties in nested element with AND condition

I'm facing a problem where I have two documents each containing an array of objects. I like to search for one document containing two properties for a nested object (matching both at the same time in the same object) but I always get both documents.

I created the documents with:

POST /respondereval/_doc
{
  "resp_id": "1236",
  "responses": [
     {"key": "meta","text":"abc"},
     {"key": "property 1", "text": "yes"},
     {"key": "property 2", "text": "yes"},
  ]
}

POST /respondereval/_doc
{
  "resp_id": "1237",
  "responses": [
     {"key": "meta","text":"abc"},
     {"key": "property 1", "text": "no"},
     {"key": "property 2", "text": "yes"},
  ]
}

I defined an index for them to prevent ES to flat out the objects like this:

PUT /respondereval
{
  "mappings" : {
    "properties": {
      "responses" : {
        "type": "nested"
      }
    }
  }
}

I now like to search for the first document (resp_id 1236) with the following query:

GET /respondereval/_search
{
  "query": {
    "nested": {
      "path": "responses",
      "query": {
        "bool": {
          "must": [
            { "match": { "responses.key": "property 1" } },
            { "match": { "responses.text": "yes" } }
          ]
        }
      }
    }
  }
}

This should only return one element which matches both conditions at the same time.

Unfortunatly, it always returns both documents. I assume it's because at some point, ES still flattens the values in the nested objects arrays into something like this (simplified):

resp_id 1236: "key":["gender", "property 1", "property 2"], "text:["abc", "yes", "yes"]
resp_id 1237: "key":["gender", "property 1", "property 2"], "text:["abc", "no", "yes"]

which both contain the property1 and yes.

What is the correct way to solve this so that only documents are returned which contains an element in the objects array which matches both conditions ("key": "property 1" AND "text": "yes") at the same time?

Upvotes: 0

Views: 1059

Answers (2)

Gibbs
Gibbs

Reputation: 22956

The problem is with your mapping. You have text mapping which uses standard analyser by default.

Standard analyzer creates tokens on whitespaces. So

property 1 will be tokenised as

{
    "tokens": [
        {
            "token": "property",
            "start_offset": 0,
            "end_offset": 8,
            "type": "<ALPHANUM>",
            "position": 0
        },
        {
            "token": "1",
            "start_offset": 9,
            "end_offset": 10,
            "type": "<NUM>",
            "position": 1
        }
    ]
}

Similarly property 2 also.

Hence both the documents are returned.

And when you search for yes, it matched from second text in the second document. property 1 matches property analysed token of second key in the document.

To make it work: - use keyword variation

{
  "query": {
    "nested": {
      "path": "responses",
      "query": {
        "bool": {
          "must": [
            { "match": { "responses.key.keyword": "property 1" } },
            { "match": { "responses.text.keyword": "yes" } }
          ]
        }
      }
    }
  }
}

It would be proper:

{
  "query": {
    "nested": {
      "path": "responses",
      "query": {
        "bool": {
          "must": [
            { "match_phrase": { "responses.key": "property 1" } },//phrase queries
            { "match": { "responses.text": "yes" } }
          ]
        }
      }
    }
  }
}

Upvotes: 2

Prathap Reddy
Prathap Reddy

Reputation: 1739

Have you directly tried the must query without nested.path

{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "responses.key": "property 1"
          }
        },
        {
          "match": {
            "responses.text": "yes"
          }
        }
      ]
    }
  }
}

Upvotes: 0

Related Questions