Puneet Pandey
Puneet Pandey

Reputation: 555

Elasticsearch random_score pushes documents towards the end of results

Here's the logic I am trying to accomplish:

I am using Elasticsearch to display top selling Products and randomly inserting newly created products in the results using function_score query DSL.

The issue I am facing is that I am using random_score fn for newly created products and the query does inserts new products up till page 2 or 3 but then rest all the other newly created products pushed towards the end of search results.

Here's the logic written for function_score:

function_score: {
  query: query,
  functions: [
    {
       filter: [
         { terms: { product_type: 'sponsored') } },
         { range: { live_at: { gte: 'CURRENT_DATE - 1.MONTH' } } }
       ],
       random_score: {
         seed: Time.current.to_i / (60 * 10), # new seed every 10 minutes
         field: '_seq_no'
       },
       weight: 0.975
    },
    {
       filter: { range: { live_at: { lt: 'CURRENT_DATE - 1.MONTH' } } },
       linear: {
         weighted_sales_rate: {
           decay: 0.9,
           origin: 0.5520974289580515,
           scale: 0.5520974289580515
         }
       },
       weight: 1
    }
  ],
  score_mode: 'sum',
  boost_mode: 'replace'
}

And then I am sorting based on {"_score" => { "order" => "desc" } }

Let's say there are 100 sponsored products created in last 1 month. Then the above Elasticsearch query displays 8-10 random products (3 to 4 per page) as I scroll through 2 or 3 pages but then all other 90-92 products are displayed in last few pages of the result. - This is because the score calculated by random_score for 90-92 products is coming lower than the score calculated by linear decay function.

Kindly suggest how can I modify this query so that I continue to see newly created Products as I navigate through pages and can prevent pushing new records towards the end of results.

[UPDATE]

I tried adding gauss decay function to this query (so that I can somehow modify the score of the products appearing towards the end of result) like below:

{
  filter: [
    { terms: { product_type: 'sponsored' } },
    { range: { live_at: { gte: 'CURRENT_DATE - 1.MONTH' } } },
    { range: { "_score" => { lt: 0.9 } } }
  ],
  gauss: {
    views_per_age_and_sales: {
      origin: 1563.77,
      scale: 1563.77,
      decay: 0.95
    }
  },
  weight: 0.95
}

But this too is not working.

Links I have referred to:

  1. https://intellipaat.com/community/12391/how-to-get-3-random-search-results-in-elasticserch-query
  2. Query to get random n items from top 100 items in Elastic Search
  3. https://www.elastic.co/guide/en/elasticsearch/reference/7.17/query-dsl-function-score-query.html

Upvotes: 0

Views: 584

Answers (1)

Puneet Pandey
Puneet Pandey

Reputation: 555

I am not sure if this is the best solution, but I was able to accomplish this with wrapping up the original query with script_score query + I have added a new ElasticSearch indexing called sort_by_views_per_year. Here's how the solution looks:

Link I referred to: https://github.com/elastic/elasticsearch/issues/7783

attribute(:sort_by_views_per_year) do
  object.live_age&.positive? ? object.views_per_year.to_f / object.live_age : 0.0
end

Then while querying ElasticSearch:

def search
  #...preparation of query...#
  query = original_query(query)
  query = rearrange_low_scoring_docs(query)

  sort = apply_sort opts[:sort]

  Product.search(query: query, sort: sort)
end

I have not changed anything in original_query (i.e. using random_score to products <= 1.month.ago and then use linear decay function).

def rearrange_low_scoring_docs query
  {
    function_score: {
      query: query,
      functions: [
        {
          script_score: {
            script: "if (_score.doubleValue() < 0.9) {return 0.9;} else {return _score;}"
          }
        }
      ],
      #score_mode: 'sum',
      boost_mode: 'replace'
    }
  }
end

Then finally my sorting looks like this:

def apply_sort
  [
    { '_score' => { 'order' => 'desc' } },
    { 'sort_by_views_per_year' => { 'order' => 'desc' } }
  ]
end

It would be way too helpful if ElasticSearch random_score query DSL starts supporting something like: max_doc_to_include and min_score attributes. So that I can use it like:

{
  filter: [
    { terms: { product_type: 'sponsored' } },
    { range: { live_at: { gte: 'CURRENT_DATE - 1.MONTH' } } }
  ],
  random_score: {
    seed: 123456, # new seed every 10 minutes
    field: '_seq_no',
    max_doc_to_include: 10,
    min_score: 0.9
  },
  weight: 0.975
},

Upvotes: 0

Related Questions