Reputation: 583
I am trying to get count of distinct values using cardinality aggregration.
here is my query
{
"size": 100,
"_source":["awardeeName"],
"query": {
"match_phrase":{"awardeeName" :"The President and Fellows of Harvard College" }
},
"aggs":{
"awardeeName": {
"filter" : { "query": { "match_phrase":{"awardeeName" :"The President and Fellows of Harvard College" }}},
"aggs": {
"distinct":{"cardinality":{ "field": "awardeeName"}}
}
}
}
}
query using match_phrase for some text, aggregation with the same match phrase and then call cardinality, The result, hits count and aggregation fitler match but cardinality shows a different number surprisingly larger than filter and total hits, here is the result
{
"took": 37,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 13.516766,
"hits": [
{
"_index": "development",
"_type": "document",
"_id": "140a3f5b-e876-4542-b16d-56c3c5ae0e58",
"_score": 13.516766,
"_source": {
"awardeeName": "The President and Fellows of Harvard College"
}
},
{
"_index": "development",
"_type": "document",
"_id": "5c668b06-c612-4349-8735-2a79ee2bb55e",
"_score": 12.913888,
"_source": {
"awardeeName": "The President and Fellows of Harvard College"
}
},
{
"_index": "development",
"_type": "document",
"_id": "a9560519-1b2a-4e64-b85f-4645a41d5810",
"_score": 12.913888,
"_source": {
"awardeeName": "The President and Fellows of Harvard College"
}
}
]
},
"aggregations": {
"awardeeName": {
"doc_count": 3,
"distinct": {
"value": 7
}
}
}
}
I expect cardinality to apply on the results of filter, but in this case cardinality shows 7 , why is it showing 7 ? How can distinct values count exceed total hits count?
Upvotes: 1
Views: 4284
Reputation: 379
Example:
GET calserver-2021.04.1*/_search
{
"size": 0,
"query": {
"bool": {
"must": [
{
"term": {
"method.keyword": "searchUser"
}
},
{
"term": {
"statusCode": "500"
}
}
]
}
},
"aggs": {
"username_count": {
"cardinality": {
"field": "username.keyword",
"precision_threshold": 40000
}
}
}
}
Upvotes: 0
Reputation: 217254
The cardinality
aggregation on the awardeeName
field is counting the number of distinct tokens present on that field for all matching documents.
In your case, in the three matching documents, the awardeeName
field contains the exact same value The President and Fellows of Harvard College
which features exactly 7 tokens, hence the result of 7 you see.
What you probably want to achieve is to count The President and Fellows of Harvard College
as a single token and for that you need a keyword
field (instead of a text
one) and use that field in your cardinality
aggregation.
Upvotes: 2