André Leite
André Leite

Reputation: 31

How can I aggregate on elasticsearch only values that occur in both indices?

How can I make a search in elasticsearch for two indices that aggregates the values that occur in both indices?

For instance:

GET indexA,indexB/_search 
{
  "aggs": {
    "myField": {
      "terms": {
        "field": "myField"
      }
    }
  }
}

This way I get all the values that myField has in both indices (indexA and indexB) but how can I change this so that it only shows the values that appear both in indexA and indexB?

To clarify, if myField has values value1, value2 and value3 in indexA but it only has value1 and value2 in indexB, my search would only show value1 and value2.

Upvotes: 1

Views: 177

Answers (1)

Andrei Stefan
Andrei Stefan

Reputation: 52368

You can do it like this (and you need Elasticsearch 2.x):

{
  "size": 0,
  "aggs": {
    "myField": {
      "terms": {
        "field": "myField"
      },
      "aggs": {
        "count_indices": {
          "cardinality": {
            "field": "_index"
          }
        },
        "values_bucket_filter_by_index_count": {
          "bucket_selector": {
            "buckets_path": {
              "count": "count_indices"
            },
            "script": "count >= 2"
          }
        }
      }
    }
  }
}

With "terms": {"field": "myField"} you get the unique myField values. Then, as a sub-aggregation, with "cardinality": {"field": "_index"} you count the number of indices that have that value and with the final aggregation - values_bucket_filter_by_index_count - you keep those buckets that have at least two indices containing them.

In the end the aggregations result look like this:

   "aggregations": {
      "myField": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "value1",
               "doc_count": 2,
               "count_indices": {
                  "value": 2
               }
            },
            {
               "key": "value2",
               "doc_count": 2,
               "count_indices": {
                  "value": 2
               }
            }
         ]
      }
   }

As I mentioned you need Elasticsearch 2.x for bucket_selector aggregation.

Upvotes: 1

Related Questions