Reputation: 401
I want to aggregate the result of other aggregation using Elasticsearch. I have created the first aggregation I need:
es.search(index='stackoverflow', body = {
"size":0,
"query": {
"bool": {
"filter": {
"match" : {"type": "Posts"}
},
"filter": {
"match" : {"PostTypeId": "1"}
}
}
},
"aggs" : {
"by_user": {
"terms": {
"field": "OwnerUserId"
}
}
}
})
This query takes all the documents of type post that are questions (PostTypeId = 1). Then, it aggregates by OwnerUserId, which counts the number of question posts of each user, giving the following result:
{'took': 0,
'timed_out': False,
'_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
'hits': {'total': {'value': 10000, 'relation': 'gte'},
'max_score': None,
'hits': []},
'aggregations': {'by_user': {'doc_count_error_upper_bound': 0,
'sum_other_doc_count': 31053,
'buckets': [{'key': '2230', 'doc_count': 223},
{'key': '', 'doc_count': 177},
{'key': '38304', 'doc_count': 158},
{'key': '5997', 'doc_count': 144},
{'key': '4048', 'doc_count': 130},
{'key': '25813', 'doc_count': 119},
{'key': '27826', 'doc_count': 119},
{'key': '2633', 'doc_count': 115},
{'key': '19919', 'doc_count': 114},
{'key': '13938', 'doc_count': 111}]}}}
Now I want to do another aggregation over the results of the previous one: aggregate by doc_count, I mean grouping and counting the equal number of question posts. For the previous result, my desired result would be:
{'buckets': [{'key': '223', 'doc_count': 1},
{'key': '177', 'doc_count': 1},
{'key': '158', 'doc_count': 1},
{'key': '144', 'doc_count': 1},
{'key': '130', 'doc_count': 1},
{'key': '119', 'doc_count': 2},
{'key': '115', 'doc_count': 1},
{'key': '114', 'doc_count': 1},
{'key': '111', 'doc_count': 1}]}
Upvotes: 2
Views: 3004
Reputation: 1
The "Scripted Metric Aggregation" is a thing that will help to make such thing and get the statistics as a response from singe query.
The advantage of this approach is that you can include any logic and retrieve the necessary data.
Hope this will help others who is still looking for the resolution for the same kinds of tasks.
Upvotes: 0
Reputation: 401
I could find a way to aggregate the result of an aggregation (at least directly). As I have read in the forums of Elasticsearch, this use case is not contemplate because it would be so inefficient.
What I did in order to solve my use case was take advantage of the transform API to store the first aggregation in a temporal index and then perform the second aggregation over that index.
First I create a transform to perform the first aggregation (grouping by OwnerUserId and counting the number of questions posted by each user):
url = 'http://localhost:9200/_transform/transform_rq1'
headers = {
'Content-Type': 'application/json'
}
query = {
"source": {
"index": "posts",
"query": {
"bool": {
"filter": {
"match" : {"PostTypeId": "1"}
}
}
}
},
"dest": {
"index": "rq1"
},
"pivot": {
"group_by": {
"OwnerUserId": {
"terms": {
"field": "OwnerUserId"
}
}
},
"aggregations": {
"count": {
"value_count": {
"field": "OwnerUserId"
}
}
}
}
}
response = requests.put(url, headers=headers, data=json.dumps(query))
Then, I start the transform in order to execute it:
url = 'http://localhost:9200/_transform/transform_rq1/_start'
headers = {
'Content-Type': 'application/json'
}
response = requests.post(url, headers=headers).json()
At last, I perform the second aggregation (grouping by the number of questions of each user to obtain how many users post how much questions) over the created temporal index:
response = es.search(index='rq1', body = {
"size":0,
"query": {
"match_all": {}
},
"aggs" : {
"by_num": {
"terms": {
"field": "count",
"order" : { "_key" : "asc" },
"size": 30000
}
}
}
})
print(response)
As you can see, I have written this code in Python.
Upvotes: 1