Brecht
Brecht

Reputation: 361

Search for documents by minimum value of field

I'm trying to filter products by their price, and I'm completely stumped as to how to proceed. Hoping someone can shed some light on this, and maybe point me in the right direction.

Concept

Each product has multiple prices. These prices are valid during a certain date-range. The actual price of the product at a certain date is the lowest price that is valid on that date.

Goal

I want to be able to:

caveat: I have simplified the restrictions for the prices for this example, but I'm not able to consolidate the dates so there's only 1 valid per date range.

Example

Mapping:

curl -XPUT 'http://localhost:9200/price-filter-test'

curl -XPUT 'http://localhost:9200/price-filter-test/_mapping/_doc' -H 'Content-Type: application/json' -d '{
    "properties": {
        "id": {"type": "integer"},
        "name": {"type": "text"},
        "prices": {
            "type": "nested",
            "properties": {
                "price": {"type": "integer"},
                "from": {"type": "date"},
                "untill": {"type": "date"}
            }
        }
    }
}'

Test entries:

curl -XPUT 'http://localhost:9200/price-filter-test/_doc/1' -H 'Content-Type: application/json' -d '{
    "id": 1,
    "name": "Product A",
    "prices": [
        {
            "price": 10,
            "from": "2020-02-01",
            "untill": "2020-03-01"
        },
        {
            "price": 8,
            "from": "2020-02-20",
            "untill": "2020-02-21"
        },
        {
            "price": 12,
            "from": "2020-02-22",
            "untill": "2020-02-23"
        }
    ]
}'

curl -XPUT 'http://localhost:9200/price-filter-test/_doc/2' -H 'Content-Type: application/json' -d '{
    "id": 2,
    "name": "Product B",
    "prices": [
        {
            "price": 20,
            "from": "2020-02-01",
            "untill": "2020-03-01"
        },
        {
            "price": 18,
            "from": "2020-02-20",
            "untill": "2020-02-21"
        },
        {
            "price": 22,
            "from": "2020-02-22",
            "untill": "2020-02-23"
        }
    ]
}'

At 2020-02-20 entries the following prices will valid, correct prices in bold:

Solution

Min/Max

I have figured out how to get the min and max values of the applicable prices. This was pretty doable using aggregations:

curl -XGET 'http://localhost:9200/price-filter-test/_search?pretty=true' -H 'Content-Type: application/json' -d '{
    "query": {"match_all": {}},
    "size": 0,
    "aggs": {
        "product_ids": {
            "terms": {"field": "id"},
            "aggs": {
                "nested_prices": {
                    "nested": {"path": "prices"},
                    "aggs": {
                        "applicable_prices": {
                            "filter": {
                                "bool": {
                                    "must": [
                                        {"range": {"prices.from": {"lte": "2020-02-20"}}},
                                        {"range": {"prices.untill": {"gte": "2020-02-20"}}}
                                    ]
                                }
                            },
                            "aggs": {
                                "min_price": {
                                    "min": {"field": "prices.price"}
                                }
                            }
                        }
                    }
                }
            }
        },
        "stats_min_prices": {
            "stats_bucket": {
                "buckets_path": "product_ids>nested_prices>applicable_prices>min_price"
            }
        }
    }
}'

Here I first aggregate over the different ids, to ensure prices are checked per product, then I filter by applicable dates, and then get the min prices for each. Using the stats_bucket aggregation, I'm then able to get the min and max values of these minimum prices.

{
  // ...
  "aggregations" : {
    // ...
    "stats_min_prices" : {
      "count" : 2,
      "min" : 8.0,
      "max" : 18.0,
      "avg" : 13.0,
      "sum" : 26.0
    }
  }
}

Here we see the correct min (8 for Product A) and max (18 for Product B)

Filtering

For filtering, I need to be able to exclude products based on their lowest price. e.g. If I search for products that cost at least 19, I shouldn't find any as Product B's lowest price is 18

curl -X GET "localhost:9200/price-filter-test/_search?pretty" -H 'Content-Type: application/json' -d '{
    "query": {
        "nested": {
            "path": "prices",
            "query": {
                "bool": {
                    "must": [
                        {
                            "range" : {
                                "prices.price" : {"gte" : 19}
                            }
                        },
                        {"range": {"prices.from": {"lte": "2020-02-20"}}},
                        {"range": {"prices.untill": {"gte": "2020-02-20"}}}
                    ]
                }
            }
        }
    }
}'

This attempt, however, still yields "Product B" as a match, as one of the prices in this date range is higher than 19. However, as it is not the lowest price in this date range, it is not the "correct" price.

I'm completely stumped as to how to do this. I've thought about using scripted fields, but I think I'd need to combine 2 (1 for calculated applicable prices, 1 for getting the lowest), and this doesn't appear to be an option.

Hope you can point me in the right direction

Upvotes: 1

Views: 755

Answers (1)

LeBigCat
LeBigCat

Reputation: 1770

Well if i right you are looking for inner_hits: https://www.elastic.co/guide/en/elasticsearch/reference/6.8/search-request-inner-hits.html

I was not sure for the aggregation (you cant inject inner_hits in the aggregation) what s why i didnot post at start.

Hope it s what you need.

{
  "query": {
    "nested": {
      "path": "prices",
      "query": {
        "range": {
          "prices.price": {
            "gte": 10,
            "lte": 20
          }
        }
      },
      "inner_hits": {}
    }
  }
}

=> will keep only nested doc mathing with the range in the inner_hits part:

"inner_hits":{
   "prices":{
      "hits":{
         "total":2,
         "max_score":1,
         "hits":[
            {
               "_nested":{
                  "field":"prices",
                  "offset":1
               },
               "_score":1,
               "_source":{
                  "price":18,
                  "from":"2020-02-20",
                  "untill":"2020-02-21"
               }
            },
            {
               "_nested":{
                  "field":"prices",
                  "offset":0
               },
               "_score":1,
               "_source":{
                  "price":20,
                  "from":"2020-02-01",
                  "untill":"2020-03-01"
               }
            }
         ]
      }
   }
}

Upvotes: 1

Related Questions