user884424
user884424

Reputation: 583

cardinality aggregation within filter aggregation

I am trying to get count of distinct values using cardinality aggregration.

here is my query

{
    "size": 100,
    "_source":["awardeeName"],
    "query": {
        "match_phrase":{"awardeeName" :"The President and Fellows of Harvard College" }  
    },
    "aggs":{
        "awardeeName": {
            "filter" : { "query": { "match_phrase":{"awardeeName" :"The President and Fellows of Harvard College" }}},
            "aggs": {
                "distinct":{"cardinality":{  "field": "awardeeName"}}
           }
        }

    }               
}

query using match_phrase for some text, aggregation with the same match phrase and then call cardinality, The result, hits count and aggregation fitler match but cardinality shows a different number surprisingly larger than filter and total hits, here is the result

  {
    "took": 37,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 3,
        "max_score": 13.516766,
        "hits": [
            {
                "_index": "development",
                "_type": "document",
                "_id": "140a3f5b-e876-4542-b16d-56c3c5ae0e58",
                "_score": 13.516766,
                "_source": {
                    "awardeeName": "The President and Fellows of Harvard College"
                }
            },
            {
                "_index": "development",
                "_type": "document",
                "_id": "5c668b06-c612-4349-8735-2a79ee2bb55e",
                "_score": 12.913888,
                "_source": {
                    "awardeeName": "The President and Fellows of Harvard College"
                }
            },
            {
                "_index": "development",
                "_type": "document",
                "_id": "a9560519-1b2a-4e64-b85f-4645a41d5810",
                "_score": 12.913888,
                "_source": {
                    "awardeeName": "The President and Fellows of Harvard College"
                }
            }
        ]
    },
    "aggregations": {
        "awardeeName": {
            "doc_count": 3,
            "distinct": {
                "value": 7
            }
        }
    }
}

I expect cardinality to apply on the results of filter, but in this case cardinality shows 7 , why is it showing 7 ? How can distinct values count exceed total hits count?

Upvotes: 1

Views: 4284

Answers (2)

Woody Sun
Woody Sun

Reputation: 379

Example:

GET calserver-2021.04.1*/_search
{
  "size": 0,
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "method.keyword": "searchUser"
          }
        },
        {
          "term": {
            "statusCode": "500"
          }
        }
      ]
    }
  },
  "aggs": {
    "username_count": {
      "cardinality": {
        "field": "username.keyword",
        "precision_threshold": 40000
      }
    }
  }
}

Upvotes: 0

Val
Val

Reputation: 217254

The cardinality aggregation on the awardeeName field is counting the number of distinct tokens present on that field for all matching documents.

In your case, in the three matching documents, the awardeeName field contains the exact same value The President and Fellows of Harvard College which features exactly 7 tokens, hence the result of 7 you see.

What you probably want to achieve is to count The President and Fellows of Harvard College as a single token and for that you need a keyword field (instead of a text one) and use that field in your cardinality aggregation.

Upvotes: 2

Related Questions