epipko
epipko

Reputation: 397

Elasticsearch, how to return unique values of two fields

I have an index with 20 different fields. I need to be able to pull unique docs where combination of fields "cat" and "sub" are unique. In SQL it would look this way: select unique cat, sub from table A; I can do it for one field this way:

{
"size": 0,
"aggs" : {
    "unique_set" : {
        "terms" : { "field" : "cat" }
    }
}}

but how do I add another field to check uniqueness across two fields?

Thanks,

Upvotes: 6

Views: 6917

Answers (3)

Kyle McClellan
Kyle McClellan

Reputation: 985

SQL's SELECT DISTINCT [cat], [sub] can be imitated with a Composite Aggregation.

{
  "size": 0, 
  "aggs": {
    "cat_sub": {
      "composite": {
        "sources": [
          { "cat": { "terms": { "field": "cat" } } },
          { "sub": { "terms": { "field": "sub" } } }
        ]
      }
    }
  }
}

Returns...

"buckets" : [
  {
    "key" : {
      "cat" : "a",
      "sub" : "x"
    },
    "doc_count" : 1
  },
  {
    "key" : {
      "cat" : "a",
      "sub" : "y"
    },
    "doc_count" : 2
  },
  {
    "key" : {
      "cat" : "b",
      "sub" : "y"
    },
    "doc_count" : 3
  }
]

Upvotes: 4

Fuad Efendi
Fuad Efendi

Reputation: 137

Quote:

I need to be able to pull unique docs where combination of fields "cat" and "sub" are unique.

This is nonsense; your question is unclear. You can have 10s unique pairs {cat, sub}, and 100s unique triplets {cat, sub, field_3}, and 1000s unique documents Doc{cat, sub, field3, field4, ...}.

If you are interested in document counts per unique pair {"Category X", "Subcategory Y"} then you can use Cardinality aggregations. For two or more fields you will need to use scripting which will come with performance hit.

Example:

{
    "aggs" : {
        "multi_field_cardinality" : {
            "cardinality" : {
                "script": "doc['cats'].value + ' _my_custom_separator_ ' + doc['sub'].value"
            }
        }
    }
}

Alternate solution: use nested Terms terms aggregations.

Upvotes: -3

Lilith Wittmann
Lilith Wittmann

Reputation: 383

The only way to solve this are probably nested aggregations:

{
"size": 0,
    "aggs" : {
        "unique_set_1" : {

            "terms" : {
                     "field" : "cats"
            },
            "aggregations" : { 
                "unique_set_2": {
                    "terms": {"field": "sub"}
                }
            }
        }
    }

}

Upvotes: 1

Related Questions