Reputation: 14097
I've got data partitioned by day with the following simplified structure (consider objects within array being separate documents). The actual amount of documents will be more than 10 million:
/products-2019.01.01
[
{
"id": 1,
"name": "Product1",
"price": 15
},
{
"id": 2,
"name": "Product2",
"price": 10
},
{
"id": 3,
"name": "Product3",
"price": 10
}
]
/products-2019.01.02
[
{
"id": 1,
"name": "Product1",
"price": 10
},
{
"id": 2,
"name": "Product2",
"price": 12
},
{
"id": 3,
"name": "Product3",
"price": 10
}
]
/products-2019.01.03
[
{
"id": 1,
"name": "Product1",
"price": 10
},
{
"id": 2,
"name": "Product2",
"price": 10
},
{
"id": 3,
"name": "Product3",
"price": 13
}
]
My use case is that I need to compare prices between two exact days, in this case, it could be:
Obviously, the more daily data is there, the more possible combinations are there.
If I had to compare 2019.01.01 to 2019.01.03, the query would have to return 2, because the price of product 1 and product 3 differ between these indexes.
I am looking for help to build such a query and am open to any other data model suggestions.
Upvotes: 1
Views: 2157
Reputation: 4609
I think you can just aggregate by id and price in this case and add min_doc_count
to filter out where it's not the same across two indexes
GET logs1,logs2/_search
{
"size": 0,
"aggs": {
"by_product": {
"terms": {
"script": {
"inline": "doc.id+'_'+doc.price"
},
"min_doc_count": 2
}
}
}
}
Upvotes: 2