Joey Yi Zhao
Joey Yi Zhao

Reputation: 42684

How can I aggregate the whole field value in Elasticsearch

I am using Elasticsearch 7.15 and need to aggregate a field and sort them by order.

My document saved in Elasticsearch looks like:

{
  "logGroup" : "/aws/lambda/myLambda1",
  ...
},
{
  "logGroup" : "/aws/lambda/myLambda2",
  ...
}

I need to find out which logGroup has the most document. In order to do that, I tried to use aggregate in Elasticsearch:

GET /my-index/_search?size=0
{
  "aggs": {
    "types_count": {
      "terms": {
        "field": "logGroup",
        "size": 10000
      }
    }
  }
}

the output of this query looks like:

"aggregations" : {
    "types_count" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "aws",
          "doc_count" : 26303620
        },
        {
          "key" : "lambda",
          "doc_count" : 25554470
        },
        {
          "key" : "myLambda1",
          "doc_count" : 25279201
        }
...
}

As you can see from above output, it splits the logGroup value into terms and aggregate based on term not the whole string. Is there a way for me to aggregate them as a whole string?

I expect the output looks like:

"buckets" : [
        {
          "key" : "/aws/lambda/myLambda1",
          "doc_count" : 26303620
        },
        {
          "key" : "/aws/lambda/myLambda2",
          "doc_count" : 25554470
        },

The logGroup field in the index mapping is:

"logGroup" : {
          "type" : "text",
          "fielddata" : true
        },

Can I achieve it without updating the index?

Upvotes: 0

Views: 329

Answers (1)

Val
Val

Reputation: 217594

In order to get what you expect you need to change your mapping to this:

    "logGroup" : {
      "type" : "keyword"
    },

Failing to do that, your log groups will get analyzed by the standard analyzer which splits the whole string and you'll not be able to aggregate by full log groups.

If you don't want or can't change the mapping and reindex everything, what you can do is the following:

First, add a keyword sub-field to your mapping, like this:

PUT /my-index/_mapping
{
    "properties": {
        "logGroup" : {
            "type" : "text",
            "fields": {
                "keyword": {
                    "type" : "keyword"
                }
            }
        }
    }
}

And then run the following so that all existing documents pick up this new field:

POST my-index/_update_by_query?wait_for_completion=false

Finally, you'll be able to achieve what you want with the following query:

GET /my-index/_search
{
  "size": 0,
  "aggs": {
    "types_count": {
      "terms": {
        "field": "logGroup.keyword",
        "size": 10000
      }
    }
  }
}

Upvotes: 1

Related Questions