Gavin Gilmour
Gavin Gilmour

Reputation: 6963

Counting non-unique items in an Elasticsearch aggregation?

I'm trying to use an Elasticsearch aggregation to return all non-unique counts for each term within a bucket.

Given a mapping:-

{
  "properties": {
    "addresses": {
      "properties": {
        "meta": {
          "properties": {
            "types": {
              "properties": {
                "type": {
                  "type": "keyword"
                }
              }
            }
          }
        }
      }
    }
  }
}

And a document:-

{
  "id": 3,
  "first_name": "James",
  "last_name": "Smith",
  "addresses": [
    {
      "meta": {
        "types": [
          {
            "type": "Home"
          },
          {
            "type": "Home"
          },
          {
            "type": "Business"
          },
          {
            "type": "Business"
          },
          {
            "type": "Business"
          },
          {
            "type": "Fax"
          }
        ]
      }
    }
  ]
}

The following terms aggregation:-

GET /test/_search
{
  "size": 0,
  "query": {
    "match": {
      "id": 3
    }
  },
  "aggs": {
    "types": {
      "terms": {
        "field": "addresses.meta.types.type"
      }
    }
  }
}

Gives this result:-

  "aggregations" : {
    "types" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "Business",
          "doc_count" : 1
        },
        {
          "key" : "Fax",
          "doc_count" : 1
        },
        {
          "key" : "Home",
          "doc_count" : 1
        }
      ]
    }
  }

As you can see the terms are unique and I'm really after a total count of each e.g. Home: 2, Business: 3 and Fax: 1.

Is this possible?

I had a look at value_count but as it's not a bucket aggregation it seems a little less convenient to use. Alternatively possible a script might do it but I'm not too sure on the syntax.

Thanks!

Upvotes: 2

Views: 522

Answers (1)

Kamal Kunjapur
Kamal Kunjapur

Reputation: 8840

I doubt if that is possible using object type in Elasticsearch. The reason is that most of the metrics aggregations is w.r.t the count of documents for particular occurrence of word and not counts of occurrence of words in documents.

You may have to change the type of your field type to nested so that ES would end up saving each type inside types as separate document.

I've provided sample mapping, document(no change in representation), aggregation query and response below.

Sample Mapping:

PUT nested_test
{ 
   "mappings":{ 
      "properties":{ 
         "id":{ 
            "type":"integer"
         },
         "first_name":{ 
            "type":"text",
            "fields":{ 
               "keyword":{ 
                  "type":"keyword"
               }
            }
         },
         "second_name":{ 
            "type":"text",
            "fields":{ 
               "keyword":{ 
                  "type":"keyword"
               }
            }
         },
         "addresses":{ 
            "properties":{ 
               "meta":{ 
                  "properties":{ 
                     "types":{ 
                        "type":"nested",                <----- Note this
                        "properties":{ 
                           "type":{ 
                              "type":"keyword"
                           }
                        }
                     }
                  }
               }
            }
         }
      }
   }
}

Sample Document (No change)

POST nested_test/_doc/1
{
  "id": 3,
  "first_name": "James",
  "last_name": "Smith",
  "addresses": [
    {
      "meta": {
        "types": [
          {
            "type": "Home"
          },
          {
            "type": "Home"
          },
          {
            "type": "Business"
          },
          {
            "type": "Business"
          },
          {
            "type": "Business"
          },
          {
            "type": "Fax"
          }
        ]
      }
    }
  ]
}

Note that every type above is now considered as a separate document linked to the main document.

Aggregation Query:

All that would be required is to make use of Nested Aggregation + Terms Aggregation

POST nested_test/_search
{
  "size": 0,
  "aggs": {
    "myterms": {
      "nested": {
        "path": "addresses.meta.types"
      },
      "aggs": {
        "myterms": {
          "terms": {
            "field": "addresses.meta.types.type",
            "size": 10,
            "min_doc_count": 2                       <----- Note this to filter only values with non unique counts
          }
        }
      }
    }
  }
}

Note that in the above query I've made use of min_doc_count in order to restrict the results as per what you are looking for.

Response:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "myterms" : {
      "doc_count" : 6,
      "myterms" : {
        "doc_count_error_upper_bound" : 0,
        "sum_other_doc_count" : 0,
        "buckets" : [
          {
            "key" : "Business",
            "doc_count" : 3
          },
          {
            "key" : "Home",
            "doc_count" : 2
          }
        ]
      }
    }
  }
}

Hope that helps!

Upvotes: 3

Related Questions