Junqueror
Junqueror

Reputation: 13

ElasticSearch: Is it possible to do a "Weighted Avg Aggregation" weighted by the score?

I'm trying to perform an avg over a price field (price.avg). But I want the best matches of the query to have more impact on the average than the latests, so the avg should be weighted by the calculated score field. This is the aggregation that I'm implementing.

{
    "query": {...},
    "size": 100,
    "aggs": {
        "weighted_avg_price": {
            "weighted_avg": {
                "value": {
                    "field": "price.avg"
                },
                "weight": {
                    "script": "_score"
                }
            }
        }
    }
}

It should give me what I want. But instead I receive a null value:

{...
    "hits": {...},
    "aggregations": {
        "weighted_avg_price": {
            "value": null
        }
    }
}

Is there something that I'm missing? Is this aggregation query feasible? Is there any workaround?

Upvotes: 1

Views: 630

Answers (2)

Junqueror
Junqueror

Reputation: 13

@jzzfs I'm trying with the approach of "avg of the first N results (ordered by _score)", using top hits aggregation:

{
    "query": {
        "bool": {
            "should": [
                ...
            ],
            "minimum_should_match": 0
        }
    },
    "size": 0,
    "from": 0,
    "sort": [
        {
            "_score": {
                "order": "desc"
            }
        }
    ],
    "aggs": {
        "top_avg_price": {
            "avg": {
                "field": "price.max"
            }
        },
        "aggs": {
            "top_hits": {
                "size": 10, // N: Changing the number of results doesn't change the top_avg_price 
                "_source": {
                    "includes": [
                        "price.max"
                    ]
                }
            }
        }
    },
    "explain": "false"
}

The avg aggregation is being done over the main results, not the top_hits aggregation. I guess the top_avg_rpice should be a subaggregation of top_hits. But I think that's not possible ATM.

Upvotes: 0

Joe - Check out my books
Joe - Check out my books

Reputation: 16925

When you debug what's available from within the script

GET prices/_search
{
  "size": 0,
  "aggs": {
    "weighted_avg_price": {
      "weighted_avg": {
        "value": {
          "field": "price"
        },
        "weight": {
          "script": "Debug.explain(new ArrayList(params.keySet()))"
        }
      }
    }
  }
}

the following gets spit out

[doc, _source, _doc, _fields]

None of these contain information about the query _score that you're trying to access because aggregations operate in a context separate from the query-level scoring. This means the weight value needs to either

  • exist in the doc or
  • exist in the doc + be modifiable or
  • be a query-time constant (like 42 or 0.1)

A workaround could be to apply a math function to the retrieved price such as

"script": "Math.pow(doc.price.value, 0.5)"

Upvotes: 1

Related Questions