Reputation: 5191

Elastic Search- Distinct elements from multiple fields

I created a mapping to index my mongoDb collection using elastic search. Here is the mapping properties:

"properties" : {
          "address_components" : {
            "properties" : {
              "_id" : {
                "type" : "string"
              },
              "subLocality1" : {
                "type" : "string",
                "index" : "not_analyzed"
              },
              "subLocality2" : {
                "type" : "string",
                "index" : "not_analyzed"
              },
              "subLocality3" : {
                "type" : "string",
                "index" : "not_analyzed"
              }, 
             "city" : {
                "type" : "string",
                "index" : "not_analyzed"
              }
            }

Now, I want to retrieve overall unique items from these fields: subLocality1, subLocality2, subLocality3, city. Also, each of the distinct value should contain q as a sub-string. Distinct item should also contain corresponding city value.

Example:

"address_components" : {
    "subLocality1" : "s1"
    "subLocality2" : "s1",
    "subLocality3" : "s2",
    "city":"a"
  }

"address_components" : {
    "subLocality1" : "s3"
    "subLocality2" : "s1",
    "subLocality3" : "s2",
    "city":"a"
  }

"address_components" : {
    "subLocality1" : "s2"
    "subLocality2" : "s1",
    "subLocality3" : "s4",
    "city":"a"
  }

For above indexes, the expected result is:

"address_components" : {
    "subLocality1" : "s1"
    "subLocality2" : "s1",
    "subLocality3" : "s2",
    "city":"ct1"
  }

"address_components" : {
    "subLocality1" : "s3"
    "subLocality2" : "s1",
    "subLocality3" : "s2",
    "city":"ct1"
  }

"address_components" : {
    "subLocality1" : "s2"
    "subLocality2" : "s1",
    "subLocality3" : "s4",
    "city":"ct1"
  }
{s1, a}, {s2,a}, {s3,a}, {s4,a},{a,a}

I tried doing it using elastic search terms aggregation.

GET /rescu/rescu/_search?pretty=true&search_type=count

{
    "aggs" : {
        "distinct_locations" : {
            "terms" : {
                "script" : "doc['address_components.subLocality1'].value"
            }
        }
    }
}

But terms aggregations only applies for single field according to following link.

Upvotes: 6

Answers (4)

Ritesh Kumar Gupta

Reputation: 5191

I found the answer myself, after going through elastic search api docs. We need to use a script to retrieve terms from multiple fields.

GET /rescu/rescu/_search?pretty=true&search_type=count
{
  "aggs": {
    "distinct_locations": {
      "terms": {
        "script": "[doc['address_components.subLocality1'].value,doc['address_components.subLocality2'].value,doc['address_components.subLocality3'].value]",
        "size": 5000
      }
    }
  }
}

Upvotes: 7

OmarOthman

Reputation: 1738

I came here from Google searching how to do this in a Kibana visualization.

Looks like Ritesh's answer is very helpful there as well.

I wanted to do a Unique Count aggregation on two fields: IPAddress and Message.

In Kibana Visualizations, the JSON Input field

helps you to modify the aggregation part of the query sent to ElasticSearch.

However, you have to extract stuff from Ritesh's answer. It's only the script part that you need.

In my case:

{
    "script": "[doc['extra.IPAddress'].value,doc['extra.Message'].value]"
}

Now, what is really missing here in the documentation is that the script parameter takes precedence over the field parameter. This is what happens in Kibana. The field parameter is sent from the interface, and the script parameter is sent because you added it in the JSON input textbox.

Upvotes: 2

Kasper Gyselinck

Reputation: 175

If you use the query provided by Fuad Efendi:

{
  "size": 0,
  "aggs": {
    "country": {
      "terms": {
        "field": "country"
      },
      "aggregations": {
        "city": {
          "terms": {
            "field": "city"
          }
        }
      }
    }
  }
}

It is important to note that the first aggregation will be scoped to any "query" you add, but the second aggregation on "city" will not and will instead be scoped to the entire database. This might not be what you want.

Personally, I find the answer provided by ritesh_NITW using a script, to have the best result.

Upvotes: 4

Fuad Efendi

Reputation: 147

Here is example with two fields: Country, City. It uses Aggregations by Country, and Sub-Aggregations by City:

{
  "size": 0,
  "aggs": {
    "country": {
      "terms": {
        "field": "country"
      },
      "aggregations": {
        "city": {
          "terms": {
            "field": "city"
          }
        }
      }
    }
  }
}

You can use many layers of sub-aggregations.

Upvotes: 5

Elastic Search- Distinct elements from multiple fields

Answers (4)

Related Questions