Compare two elasticsearch document fields from different indexes partitioned by date

Question

I've got data partitioned by day with the following simplified structure (consider objects within array being separate documents). The actual amount of documents will be more than 10 million:

/products-2019.01.01

[
  {
    "id": 1,
    "name": "Product1",
    "price": 15
  },
  {
    "id": 2,
    "name": "Product2",
    "price": 10
  },
  {
    "id": 3,
    "name": "Product3",
    "price": 10
  }
]

/products-2019.01.02

[
  {
    "id": 1,
    "name": "Product1",
    "price": 10
  },
  {
    "id": 2,
    "name": "Product2",
    "price": 12
  },
  {
    "id": 3,
    "name": "Product3",
    "price": 10
  }
]

/products-2019.01.03

[
  {
    "id": 1,
    "name": "Product1",
    "price": 10
  },
  {
    "id": 2,
    "name": "Product2",
    "price": 10
  },
  {
    "id": 3,
    "name": "Product3",
    "price": 13
  }
]

My use case is that I need to compare prices between two exact days, in this case, it could be:

2019.01.01 vs 2019.01.02
2019.01.01 vs 2019.01.03
2019.01.02 vs 2019.01.03

Obviously, the more daily data is there, the more possible combinations are there.

If I had to compare 2019.01.01 to 2019.01.03, the query would have to return 2, because the price of product 1 and product 3 differ between these indexes.

I am looking for help to build such a query and am open to any other data model suggestions.

Andrey Borisko · Accepted Answer

I think you can just aggregate by id and price in this case and add min_doc_count to filter out where it's not the same across two indexes

GET logs1,logs2/_search
{
  "size": 0,
  "aggs": {
    "by_product": {
      "terms": {
        "script": {
          "inline": "doc.id+'_'+doc.price"
        },
        "min_doc_count": 2
      }
    }
  }
}

Compare two elasticsearch document fields from different indexes partitioned by date

Answers (1)

Related Questions