Reputation: 77
I am looking for a query that will return the value which is missing in the documents from the given list of values. For example, there is a country field in the documents with values USA, Dubai, Singapore, Japan. Now I want to tell elastic search that I am giving you the list of countries(USA, Dubai, Russia), you give me the output which tells Russia is not part of any document. Is this possible?
Upvotes: 0
Views: 1750
Reputation: 217294
You need to do a query like the one below that will only select documents with USA, Dubai and Russia and then aggregate the country
values.
{
"size": 0,
"query": {
"terms": {
"country": [
"USA",
"Dubai",
"Russia"
]
}
},
"aggs": {
"countries": {
"terms": {
"field": "country"
}
}
}
}
In the results, you're going to get buckets for all countries that are present (i.e. USA and Dubai) and no bucket for Russia.
You can then do a simple set arithmetics by subtracting the input array with the one you got from the aggregation results and you'll find what you need, i.e.:
[USA, Dubai, Russia] - [USA, Dubai] = [Russia]
UPDATE: If you want to do all the above in a single country you can leverage the scripted_metric
aggregation.
map_script
is going to run for each document on a shard and store all present countries in the temporary variable state.countries
.
reduce_script
is going to run on the coordinating node and receives the results of all shards. That script is simply comparing which countries in the params.countries
array are present and is going to only output the countries that are not present.
POST country/_search
{
"size": 0,
"query": {
"terms": {
"country": [
"USA",
"Dubai",
"Russia"
]
}
},
"aggs": {
"missing_countries": {
"scripted_metric": {
"init_script": "state.countries = [:]",
"map_script": """
def country = doc['country.keyword'].value;
if (!state.countries.containsKey(country)) {
state.countries[country] = 0;
}
state.countries[country]++;
""",
"combine_script": """
return state.countries;
""",
"reduce_script": """
// gather all present countries
def countries = new HashSet();
for (state in states) {
countries.addAll(state.keySet());
}
// figure out which country in params is not present in countries
def missing = [];
for (country in params.countries) {
if (!countries.contains(country)) {
missing.add(country);
}
}
return missing;
""",
"params": {
"countries": ["USA", "Dubai", "Russia"]
}
}
}
}
}
In this case, the output is going to be
"aggregations" : {
"missing_countries" : {
"value" : [
"Russia"
]
}
}
Upvotes: 1