Reputation: 159
I want to group the search result by field. Example: I have data with userId corresponding to multiple usernames. So in search result I want to group all the userId and its corresponding usernames.
Currently using aggregation, I am able to group the userId but not able to retrieve its corresponding usernames list. I am getting something as follow.
"aggregations" : {
"by_user_id" : {
"after_key" : {
"group_by_search" : 2335
},
"buckets" : [
{
"key" : {
"group_by_search" : 2
},
"doc_count" : 2
},
{
"key" : {
"group_by_search" : 1000
},
"doc_count" : 4
},
{
"key" : {
"group_by_search" : 2335
},
"doc_count" : 2
}
]
}
What I want is:
"aggregations" : {
"by_corp_id" : {
"after_key" : {
"group_by_search" : 2335
},
"buckets" : [
{
"key" : {
"group_by_search" : 2
"usernames":[1111,222] ***//this is list of usernames having same userId***
},
"doc_count" : 2
},
{
"key" : {
"group_by_search" : 1000
"usernames":[11 ,0101,1199,222] ***//this is list of usernames having same userId***
},
"doc_count" : 4
},
{
"key" : {
"group_by_search" : 2335
"usernames":[1111,222] ***//this is list of usernames having same userId***
},
"doc_count" : 2
}
]
}
Is there any way to directly achieve this using aggregation in Elasticsearch?
Update: I am using following aggregation
"aggregations": {
"by_user_id": {
"composite": {
"size": 1000,
"sources": [
{
"group_by_search": {
"terms": {
"field": "user_id",
"missing_bucket": false,
"order": "asc"
}
}
}
]
}
}
}
Thank you.
Upvotes: 1
Views: 568
Reputation: 217304
What you need to do is simply add a terms
sub-aggregation on the username field so that each buckets gets a list of all unique usernames:
"aggregations": {
"by_user_id": {
"composite": {
"size": 1000,
"sources": [
{
"group_by_search": {
"terms": {
"field": "user_id",
"missing_bucket": false,
"order": "asc"
}
}
}
]
},
"aggs": {
"username": {
"terms": {
"field": "username",
"size": 1000
}
}
}
}
}
top_hits
would also be possible but you will get a lot of duplicates and you will need to return a lot of hits in order to make sure you have all the possible distinct usernames.
If your username field has a high cardinality (>1000), then it might be better to simply move the terms aggregation on username into the composite source array and iterate over all buckets yourself, like this:
"aggregations": {
"by_user_id": {
"composite": {
"size": 1000,
"sources": [
{
"group_by_search": {
"terms": {
"field": "user_id",
"missing_bucket": false,
"order": "asc"
}
}
},
{
"group_by_username": {
"terms": {
"field": "username",
"missing_bucket": false,
"order": "asc"
}
}
}
]
}
}
}
Upvotes: 1
Reputation: 16172
You can use top hits aggregation to get a list of all usernames having the same id.
Adding a working example
Index Data:
{
"usernames": 3,
"user_id": 2
}
{
"usernames": 1,
"user_id": 1
}
{
"usernames": 2,
"user_id": 1
}
Search Query:
{
"size": 0,
"aggregations": {
"by_user_id": {
"composite": {
"size": 1000,
"sources": [
{
"group_by_search": {
"terms": {
"field": "user_id",
"missing_bucket": false,
"order": "asc"
}
}
}
]
},
"aggs": {
"list_names": {
"top_hits": {
"_source": {
"includes": [
"usernames"
]
}
}
}
}
}
}
}
Search Result:
"aggregations": {
"by_user_id": {
"after_key": {
"group_by_search": 2
},
"buckets": [
{
"key": {
"group_by_search": 1 // note this
},
"doc_count": 2,
"list_names": {
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "66362501",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"usernames": 1 // note this
}
},
{
"_index": "66362501",
"_type": "_doc",
"_id": "2",
"_score": 1.0,
"_source": {
"usernames": 2 // note this
}
}
]
}
}
},
{
"key": {
"group_by_search": 2
},
"doc_count": 1,
"list_names": {
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "66362501",
"_type": "_doc",
"_id": "3",
"_score": 1.0,
"_source": {
"usernames": 3
}
}
]
}
}
}
]
}
}
Upvotes: 0