Reputation: 584
I have millions of documents with a block like this one:
{
"useraccountid": 123456,
"purchases_history" : {
"last_updated" : "Sat Apr 27 13:41:46 UTC 2019",
"purchases" : [
{
"purchase_id" : 19854284,
"purchase_date" : "Jan 11, 2017 7:53:35 PM"
},
{
"purchase_id" : 19854285,
"purchase_date" : "Jan 12, 2017 7:53:35 PM"
},
{
"purchase_id" : 19854286,
"purchase_date" : "Jan 13, 2017 7:53:35 PM"
}
]
}
}
I am trying to figure out how I can do something like:
SELECT useraccountid, max(purchases_history.purchases.purchase_date) FROM my_index GROUP BY useraccountid
I only found the max aggregation but it aggregates over all the documents in the index, but this is not what I need. I need to find the max purchase date for each document. I believe there must be a way to iterate over each path purchases_history.purchases.purchase_date of each document to identify which one is the max purchase date, but I really cannot find how to do it (if this is really the best way of course).
Any suggestion?
Upvotes: 0
Views: 938
Reputation: 1428
I assume that your field useraccountid
is unique. You will have to do a terms aggregation, inside do the max aggregation. I can think of this:
"aggs":{
"unique_user_ids":{
"terms":{
"field": "useraccountid",
"size": 10000 #Default value is 10
},
"aggs":{
"max_date":{
"max":{
"field": "purchases_history.purchases.purchase_date"
}
}
}
}
}
In the aggregations
field you'll see first the unique user ID and inside, their max date.
Note the 10,000 in the size. The terms
aggregation is only recommended to return until 10,000 results.
If you need more, you can play with the Composite aggregation. With that, you can paginate your results and your cluster won't get performance issues.
I can think of the following if you want to play with Composite:
GET /_search
{
"aggs" : {
"my_buckets": {
"composite" : {
"size": 10000, #Default set to 10
"sources" : [
{ "user_id": { "terms": {"field": "useraccountid" } } },
{ "product": { "max": { "field": "purchases_history.purchases.purchase_date" } } }
]
}
}
}
}
After running the query, it will return a field called after_key
. With that field you can paginate your results in pages of 10,000 elements. Take a look at the After parameter for the composite aggregation.
Hope this is helpful! :D
Upvotes: 2