Multiplicative boosts for categorical variable

Question

Let's say my Solr index has a field called "item_type" with categorical values like "abc", "defg", hij", "kl", "mno", and in my fulltext search (using other indexed fields) I'd like to use a multiplicative boost for items whose "item_type" value is contained in a predefined subset like "defg", hij", "mno".

https://github.com/sunspot/sunspot?tab=readme-ov-file#functions has only examples for multiplicative boosts with numeric fields, and additive boosts with both numeric an categorical fields.

Something like

Model.search do
  fulltext search_input do
    boost_fields title: 10.0
  end
  boost_multiplicative(3.0) { with(:item_type, %w[defg hij mno]) }
end

ignores the block; Sunspot::DSL::StandardQuery#boost_multiplicative doesn't have a "&block" argument like Sunspot::DSL::StandardQuery#boost. And for what reason should we multiply the scores of all result items by some constant factor?

How can I e.g. write some function as follows, with fast response time on millions of items:

boost_multiplicative(function { if(contained(:item_type, %w[defg hij mno]), 3.0, 1.0) })

Is the only chance using a precalculated field like "boost_value" with value 3.0 for selected item types and 1.0 for others, and adding that new field to the Solr index? That would mean that such multiplicative boosts couldn't be dynamically changed like in a products search with changeable boost weights.

According to https://solr.apache.org/guide/8_11/the-dismax-query-parser.html#bq-boost-query-parameter additive boosts which are used by Sunspot::DSL::StandardQuery#boost don't make much sense:

Additive Boosts vs Multiplicative Boosts

Generally speaking, using bq (or bf, below) is considered a poor way to "boost" documents by a secondary query because it has an "Additive" effect on the final score. The overall impact a particular bq parameter will have on a given document can vary a lot depending on the absolute values of the scores from the original query as well as the bq query, which in turn depends on the complexity of the original query, and various scoring factors (TF, IDF, average field length, etc.)

"Multiplicative Boosting" is generally considered to be a more predictable method of influencing document score, because it acts as a "scaling factor" — increasing (or decreasing) the scores of each document by a relative amount.

The {!boost} QParser provides a convenient wrapper for implementing multiplicative boosting, and the {!edismax} QParser offers a boost query parameter shortcut for using it.

Anyone has some advice on this?

Multiplicative boosts for categorical variable

Answers (0)

Related Questions