Sumit Nekar
Sumit Nekar

Reputation: 225

Finding unique documents in an index in elastic search

I am having duplicates entries in my index and I want to find out only unique documents in the index . TopHits aggregation solves this problem but my other requirement is to support sorting on the results (across buckets). Hence I cant use top hits aggregation.
Other options I can think of is to write a plugin or use painless script. Need help to solve this.It would be great if you can redirect me to some examples.

Upvotes: 0

Views: 367

Answers (1)

Aman Garg
Aman Garg

Reputation: 3290

Top hits aggregation find the value from the complete result set while If you use cardinality it gives only filtered result set. You can use cardinality aggregation like below:

{
    "aggs" : {
        "UNIQUE_COUNT" : {
            "cardinality" : {
                "field" : "your_field"
            }
        }
    }
}

This aggregation comes with some responsibility, You can find the below ElasticSearch documentation to understand it better. Link: Cardinality Aggregation

For sorting, you can refer the below example, where you can pass your aggregation in order of terms for which your bucket get created:

{
    "aggs": {
        "AGG_NAME": {
            "terms": {
                "field": "you_field",
                "size": 10,
                "order": {
                    "UNIQUE_COUNT.doc_count": "asc"
                },
                "min_doc_count": 1
            },
            "aggs": {
                "UNIQUE_COUNT": {
                    "cardinality": {
                        "field": "your_field"
                    }
                }
            }    
        }
    }
}

Upvotes: 1

Related Questions