Manish
Manish

Reputation: 159

Elasticsearch group by field

I want to group the search result by field. Example: I have data with userId corresponding to multiple usernames. So in search result I want to group all the userId and its corresponding usernames.

Currently using aggregation, I am able to group the userId but not able to retrieve its corresponding usernames list. I am getting something as follow.

"aggregations" : {
"by_user_id" : {
  "after_key" : {
    "group_by_search" : 2335
  },
  "buckets" : [
    {
      "key" : {
        "group_by_search" : 2
      },
      "doc_count" : 2
    },
    {
      "key" : {
        "group_by_search" : 1000
      },
      "doc_count" : 4
    },
    {
      "key" : {
        "group_by_search" : 2335
      },
      "doc_count" : 2
    }
  ]
}

What I want is:

"aggregations" : {
"by_corp_id" : {
  "after_key" : {
    "group_by_search" : 2335
  },
  "buckets" : [
    {
      "key" : {
        "group_by_search" : 2
        "usernames":[1111,222] ***//this is list of usernames having same userId***
      },
      "doc_count" : 2
    },
    {
      "key" : {
        "group_by_search" : 1000
        "usernames":[11 ,0101,1199,222] ***//this is list of usernames having same userId***
      },
      "doc_count" : 4
    },
    {
      "key" : {
        "group_by_search" : 2335
        "usernames":[1111,222] ***//this is list of usernames having same userId***
      },
      "doc_count" : 2
    }
  ]
}

Is there any way to directly achieve this using aggregation in Elasticsearch?

Update: I am using following aggregation

"aggregations": {
    "by_user_id": {
        "composite": {
            "size": 1000,
            "sources": [
                {
                    "group_by_search": {
                        "terms": {
                            "field": "user_id",
                            "missing_bucket": false,
                            "order": "asc"
                        }
                    }
                }
            ]
        }
    }
}

Thank you.

Upvotes: 1

Views: 568

Answers (2)

Val
Val

Reputation: 217304

What you need to do is simply add a terms sub-aggregation on the username field so that each buckets gets a list of all unique usernames:

"aggregations": {
    "by_user_id": {
        "composite": {
            "size": 1000,
            "sources": [
                {
                    "group_by_search": {
                        "terms": {
                            "field": "user_id",
                            "missing_bucket": false,
                            "order": "asc"
                        }
                    }
                }
            ]
        },
        "aggs": {
            "username": {
                "terms": {
                    "field": "username",
                    "size": 1000
                }
            }
        }
    }
}

top_hits would also be possible but you will get a lot of duplicates and you will need to return a lot of hits in order to make sure you have all the possible distinct usernames.

If your username field has a high cardinality (>1000), then it might be better to simply move the terms aggregation on username into the composite source array and iterate over all buckets yourself, like this:

"aggregations": {
    "by_user_id": {
        "composite": {
            "size": 1000,
            "sources": [
                {
                    "group_by_search": {
                        "terms": {
                            "field": "user_id",
                            "missing_bucket": false,
                            "order": "asc"
                        }
                    }
                },
                {
                    "group_by_username": {
                        "terms": {
                            "field": "username",
                            "missing_bucket": false,
                            "order": "asc"
                        }
                    }
                }
            ]
        }
    }
}

Upvotes: 1

Bhavya
Bhavya

Reputation: 16172

You can use top hits aggregation to get a list of all usernames having the same id.

Adding a working example

Index Data:

{
  "usernames": 3,
  "user_id": 2
}
{
  "usernames": 1,
  "user_id": 1
}
{
  "usernames": 2,
  "user_id": 1
}

Search Query:

{
  "size": 0,
  "aggregations": {
    "by_user_id": {
      "composite": {
        "size": 1000,
        "sources": [
          {
            "group_by_search": {
              "terms": {
                "field": "user_id",
                "missing_bucket": false,
                "order": "asc"
              }
            }
          }
        ]
      },
      "aggs": {
        "list_names": {
          "top_hits": {
            "_source": {
              "includes": [
                "usernames"
              ]
            }
          }
        }
      }
    }
  }
}

Search Result:

"aggregations": {
    "by_user_id": {
      "after_key": {
        "group_by_search": 2      
      },
      "buckets": [
        {
          "key": {
            "group_by_search": 1        // note this
          },
          "doc_count": 2,
          "list_names": {
            "hits": {
              "total": {
                "value": 2,
                "relation": "eq"
              },
              "max_score": 1.0,
              "hits": [
                {
                  "_index": "66362501",
                  "_type": "_doc",
                  "_id": "1",
                  "_score": 1.0,
                  "_source": {
                    "usernames": 1             // note this
                  }
                },
                {
                  "_index": "66362501",
                  "_type": "_doc",
                  "_id": "2",
                  "_score": 1.0,
                  "_source": {
                    "usernames": 2           // note this
                  }
                }
              ]
            }
          }
        },
        {
          "key": {
            "group_by_search": 2
          },
          "doc_count": 1,
          "list_names": {
            "hits": {
              "total": {
                "value": 1,
                "relation": "eq"
              },
              "max_score": 1.0,
              "hits": [
                {
                  "_index": "66362501",
                  "_type": "_doc",
                  "_id": "3",
                  "_score": 1.0,
                  "_source": {
                    "usernames": 3       
                  }
                }
              ]
            }
          }
        }
      ]
    }
  }

Upvotes: 0

Related Questions