House
House

Reputation: 95

Elasticsearch array only contains query

Let's say I've data in this format:

{
  "id": "doc0",
  "tags": ["a", "b", "c"]
}

{
  "id": "doc1",
  "tags": ["a", "b", "c", "d"]
}

{
  "id": "doc2",
  "tags": ["a", "b"]
}

I need to form an ES query that fetches only documents that contains both "a", "b" and nothing else.

If I write a terms query, it matches all the documents, as all documents have both "a" and "b" but only one document has nothing else apart from "a" and "b"

What is the best way to form this query? I don't have the list of the other values to add "not_contains" clause.

Upvotes: 0

Views: 435

Answers (2)

Bhavya
Bhavya

Reputation: 16172

There are two ways in which you can achieve your result :

  1. You can use a combination of bool query(with must and filter clause) and script query to retrieve only those documents that have both "a" and "b".

Index Data:

POST testidx/_doc/1
{
  "id": "doc0",
  "tags": ["a", "b", "c"]
}

POST testidx/_doc/2
{
  "id": "doc1",
  "tags": ["a", "b", "c", "d"]
}

POST testidx/_doc/3
{
  "id": "doc2",
  "tags": ["a", "b"]
}

Search Query:

POST testidx/_search
{
  "query": {
    "bool": {
      "filter": {
        "bool": {
          "must": [
            {
              "term": {
                "tags": "a"
              }
            },
            {
              "term": {
                "tags": "b"
              }
            },
            {
              "script": {
                "script": {
                  "source": "if(params.input.containsAll(doc['tags.keyword'])){return true;}",
                  "lang": "painless",
                  "params": {
                    "input": [
                      "a",
                      "b"
                    ]
                  }
                }
              }
            }
          ]
        }
      }
    }
  }
}

Search Result:

"hits" : [
      {
        "_index" : "testidx",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.0,
        "_source" : {
          "id" : "doc2",
          "tags" : [
            "a",
            "b"
          ]
        }
      }
    ]
  1. You can use minimum_should_match_script param with terms set query. When compared to a script query, Terms set query will be faster.

enter image description here

POST testidx/_search
{
  "query": {
    "bool": {
      "filter": {
        "terms_set": {
          "tags": {
            "terms": [
              "a",
              "b"
            ],
            "minimum_should_match_script": {
              "source": "doc['tags.keyword'].size()"
            }
          }
        }
      }
    }
  }
}

Upvotes: 2

Sagar Patel
Sagar Patel

Reputation: 5486

You can use Terms Set query.

Before using teams set query, you need to update your index document with number of elements count in one field.

PUT sample1/_doc/1
{
 "id": "doc0",
  "tags": ["a", "b", "c"],
  "required_matches": 3
}
PUT sample1/_doc/2
{
  "id": "doc1",
  "tags": ["a","b","c","d"],
  "required_matches": 4
}
PUT sample1/_doc/3
{
  "id": "doc2",
  "tags": ["a","b"],
  "required_matches": 2
}

Query:

POST sample1/_search
{
  "query": {
    "terms_set": {
      "tags": {
        "terms": [ "a", "b"],
        "minimum_should_match_field": "required_matches"
      }
    }
  }
}

Result:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.17161848,
    "hits" : [
      {
        "_index" : "sample1",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.17161848,
        "_source" : {
          "id" : "doc2",
          "tags" : [
            "a",
            "b"
          ],
          "required_matches" : 2
        }
      }
    ]
  }
}

Upvotes: 0

Related Questions