Kasturi Chavan
Kasturi Chavan

Reputation: 17

Solr Boosting Logic Concepts

I'm trying to understand boosting and if boosting is the answer to my problem. I have an index and that has different types of data. EG: Index Animals. One of the fields is animaltype. This value can be Carnivorous, herbivorous etc. Now when a we query in search, I want to show results of type carnivorous at top, and then the herbivorous type. Also would it be possible to show only say top 3 results from a type and then remaining from other types?

Let assume for a herbivourous type we have a field named vegetables. This will have values only for a herbivourous animaltype. Now, can it be possible to have boosting rules specified as follows: Boost Levels: animaltype:Carnivorous then animaltype:Herbivorous and vegatablesfield: spinach then animaltype:herbivoruous and vegetablesfield: carrot

etc. Basically boosting on various fields at various levels. Im new to this concept. It would really helpful to get some inputs/guidance.

Thanks, Kasturi Chavan

Upvotes: 0

Views: 339

Answers (1)

MatsLindh
MatsLindh

Reputation: 52802

Your example is closer to sorting than boosting, as you have a priority list for how important each document is - while boosting (in Solr) is usually applied a bit more fluent, meaning that there is no hard line between documents of type X and type Y.

However - boosting with appropriately large values will in effect give you the same result, putting the documents into different score "areas" which will then give you the sort order you're looking for. You can see the score contributed by each term by appending debugQuery=true to your query. Boosting says that 'a document with this value is z times more important than those with a different value', but if the document only contains low scoring tokens from the search (usually words that are very common), while other documents contain high scoring tokens (words that are infrequent), the latter document might still be considered more important.

Example: Searching for "city paris", where most documents contain the word 'city', but only a few contain the word 'paris' (but does not contain city). Even if you boost all documents assigned to country 'germany', the score contributed from city might still be lower - even with the boost factor than what 'paris' contributes alone. This might not occur in real life, but you should know what the boost actually changes.

Using the edismax handler, you can apply the boost in two different ways - one is to use boost=, which is multiplicative, or to use either bq= or bf=, which are additive. The difference is how the boost contributes to the end score.

For your example, the easiest way to get something similar to what you're asking, is to use bq (boost query):

bq=animaltype:Carnivorous^1000&
bq=animaltype:Herbivorous^10

These boosts will probably be large enough to move all documents matching these queries into their own buckets, without moving between groups. To create "different levels" as your example shows, you'll need to tweak these values (and remember, multiple boosts can be applied to the same document if something is both herbivorous and eats spinach).

A different approach would be to create a function query using query, if and similar functions to result in a single integer value that you can use as a sorting value. You can also calculate this value when indexing the document if it's static (which your example is), and then sort by that field instead. It will require you to reindex your documents if the sorting values change, but it might be an easy and effective solution.

To achieve the "Top 3 results from a type" you're probably going to want to look at Result grouping support - which makes it possible to get "x documents" for each value in a single field. There is, as far as I know, no way to say "I want three of these at the top, then the rest from other values", except for doing multiple queries (and excluding the three you've already retrieved from the second query). Usually issuing multiple queries works just as fine (or better) performance wise.

Upvotes: 1

Related Questions