vinay
vinay

Reputation: 77

Getting Missing value in Elasticsearch

I am looking for a query that will return the value which is missing in the documents from the given list of values. For example, there is a country field in the documents with values USA, Dubai, Singapore, Japan. Now I want to tell elastic search that I am giving you the list of countries(USA, Dubai, Russia), you give me the output which tells Russia is not part of any document. Is this possible?

Upvotes: 0

Views: 1750

Answers (1)

Val
Val

Reputation: 217294

You need to do a query like the one below that will only select documents with USA, Dubai and Russia and then aggregate the country values.

{
  "size": 0,
  "query": {
    "terms": {
      "country": [
        "USA",
        "Dubai",
        "Russia"
      ]
    }
  },
  "aggs": {
    "countries": {
      "terms": {
        "field": "country"
      }
    }
  }
}

In the results, you're going to get buckets for all countries that are present (i.e. USA and Dubai) and no bucket for Russia.

You can then do a simple set arithmetics by subtracting the input array with the one you got from the aggregation results and you'll find what you need, i.e.:

[USA, Dubai, Russia] - [USA, Dubai] = [Russia]

UPDATE: If you want to do all the above in a single country you can leverage the scripted_metric aggregation.

map_script is going to run for each document on a shard and store all present countries in the temporary variable state.countries.

reduce_script is going to run on the coordinating node and receives the results of all shards. That script is simply comparing which countries in the params.countries array are present and is going to only output the countries that are not present.

POST country/_search
{
  "size": 0,
  "query": {
    "terms": {
      "country": [
        "USA",
        "Dubai",
        "Russia"
      ]
    }
  },
  "aggs": {
    "missing_countries": {
      "scripted_metric": {
        "init_script": "state.countries = [:]",
        "map_script": """
          def country = doc['country.keyword'].value;
          if (!state.countries.containsKey(country)) {
            state.countries[country] = 0;
          }
          state.countries[country]++;
        """,
        "combine_script": """
          return state.countries;
        """,
        "reduce_script": """
          // gather all present countries
          def countries = new HashSet(); 
          for (state in states) {
            countries.addAll(state.keySet());
          }
          // figure out which country in params is not present in countries
          def missing = [];
          for (country in params.countries) {
            if (!countries.contains(country)) {
              missing.add(country);
            }
          }
          return missing;
        """,
        "params": {
          "countries": ["USA", "Dubai", "Russia"]
        }
      }
    }
  }
}

In this case, the output is going to be

  "aggregations" : {
    "missing_countries" : {
      "value" : [
        "Russia"
      ]
    }
  }

Upvotes: 1

Related Questions