Developer
Developer

Reputation: 847

Count distinct values using elasticsearch

I am learning elastic search and would like to count distinct values. So far I can count values but not distinct.

Here is the sample data:

curl http://localhost:9200/store/item/ -XPOST -d '{
  "RestaurantId": 2,
  "RestaurantName": "Restaurant Brian",
  "DateTime": "2013-08-16T15:13:47.4833748+01:00"
}'

curl http://localhost:9200/store/item/ -XPOST -d '{
  "RestaurantId": 1,
  "RestaurantName": "Restaurant Cecil",
  "DateTime": "2013-08-16T15:13:47.4833748+01:00"
}'

curl http://localhost:9200/store/item/ -XPOST -d '{
  "RestaurantId": 1,
  "RestaurantName": "Restaurant Cecil",
  "DateTime": "2013-08-16T15:13:47.4833748+01:00"
}'

And what I tried so far:

curl -XPOST "http://localhost:9200/store/item/_search" -d '{
  "size": 0,
  "aggs": {
    "item": {
      "terms": {
        "field": "RestaurantName"
      }
    }
  }
}'

Output:

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0.0,
    "hits": []
  },
  "aggregations": {
    "item": {
      "buckets": [
        {
          "key": "restaurant",
          "doc_count": 3
        },
        {
          "key": "cecil",
          "doc_count": 2
        },
        {
          "key": "brian",
          "doc_count": 1
        }
      ]
    }
  }
}

How can I get count of cecil as 1 instead of 2

Upvotes: 23

Views: 61463

Answers (5)

Use Cardinality Feature: Docs : https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html

Example :

 "aggs": {
                "unquieValues": {
                  "cardinality": {
                    "field": "ourUniqueId.keyword",
                    "precision_threshold": 100
                  }
                }
              }

Upvotes: 1

Pratik Patil
Pratik Patil

Reputation: 107

It is too late for me to answer this question for the original Author, but for anybody who is facing the same issue and reached here, my answer might help.

ES provides Cardinality for sure to get distinct count, but it is not accurate. For accuracy, a proper solution can be used. I have written an article on this which might help : Accurate Distinct Count and Values from Elasticsearch.

Upvotes: 5

c24b
c24b

Reputation: 5552

You have to use cardinality option as mentioned by @coder that you can find in the doc

$ curl -XGET "http://localhost:9200/store/item/_search" -d'
{
"aggs" : {
    "restaurant_count" : {
        "cardinality" : {
            "field" : "RestaurantName",
            "precision_threshold": 100, 
            "rehash": false 
            }
          }
         }
}'

This worked for me ...

Upvotes: 14

asu
asu

Reputation: 569

There's no support for distinct counting in ElasticSearch, although non-deterministic counting exists. Use "terms" aggregation and count buckets in result. See Count distinct on elastic search question.

Upvotes: 0

Related Questions