Reputation: 847
I am learning elastic search and would like to count distinct values. So far I can count values but not distinct.
Here is the sample data:
curl http://localhost:9200/store/item/ -XPOST -d '{
"RestaurantId": 2,
"RestaurantName": "Restaurant Brian",
"DateTime": "2013-08-16T15:13:47.4833748+01:00"
}'
curl http://localhost:9200/store/item/ -XPOST -d '{
"RestaurantId": 1,
"RestaurantName": "Restaurant Cecil",
"DateTime": "2013-08-16T15:13:47.4833748+01:00"
}'
curl http://localhost:9200/store/item/ -XPOST -d '{
"RestaurantId": 1,
"RestaurantName": "Restaurant Cecil",
"DateTime": "2013-08-16T15:13:47.4833748+01:00"
}'
And what I tried so far:
curl -XPOST "http://localhost:9200/store/item/_search" -d '{
"size": 0,
"aggs": {
"item": {
"terms": {
"field": "RestaurantName"
}
}
}
}'
Output:
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0.0,
"hits": []
},
"aggregations": {
"item": {
"buckets": [
{
"key": "restaurant",
"doc_count": 3
},
{
"key": "cecil",
"doc_count": 2
},
{
"key": "brian",
"doc_count": 1
}
]
}
}
}
How can I get count of cecil
as 1 instead of 2
Upvotes: 23
Views: 61463
Reputation: 382
Use Cardinality Feature: Docs : https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html
Example :
"aggs": {
"unquieValues": {
"cardinality": {
"field": "ourUniqueId.keyword",
"precision_threshold": 100
}
}
}
Upvotes: 1
Reputation: 107
It is too late for me to answer this question for the original Author, but for anybody who is facing the same issue and reached here, my answer might help.
ES provides Cardinality for sure to get distinct count, but it is not accurate. For accuracy, a proper solution can be used. I have written an article on this which might help : Accurate Distinct Count and Values from Elasticsearch.
Upvotes: 5
Reputation: 5552
You have to use cardinality option as mentioned by @coder that you can find in the doc
$ curl -XGET "http://localhost:9200/store/item/_search" -d'
{
"aggs" : {
"restaurant_count" : {
"cardinality" : {
"field" : "RestaurantName",
"precision_threshold": 100,
"rehash": false
}
}
}
}'
This worked for me ...
Upvotes: 14
Reputation: 569
There's no support for distinct counting in ElasticSearch, although non-deterministic counting exists. Use "terms" aggregation and count buckets in result. See Count distinct on elastic search question.
Upvotes: 0
Reputation: 1941
Use could use cardinality here: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html
Upvotes: 5