Reputation: 7940
I'm using the rord()
function in Solr queries in order to boost query results against a "rank" field, using a syntax something like this:
bf=rord(cur_rank)^1.8
The algorithm works well, but recent changes in Solr indicate that using ord() and rord() is a memory hog now. From the changelog:
Searching and sorting is now done on a per-segment basis, meaning that the FieldCache entries used for sorting and for function queries are created and used per-segment and can be reused for segments that don't change between index updates. While generally beneficial, this can lead to increased memory usage over 1.3 in certain scenarios:
[...]
2) Certain function queries such as ord() and rord() require a top level FieldCache instance and can thus lead to increased memory usage. Consider replacing ord() and rord() with alternatives, such as function queries based on ms() for date boosting.
It mentions possible strategies for handling date-based boosting, but how about for a number like "rank" where rank is a number between 1 and the total number of records?
rord() seems ideal... any other strategies?
Upvotes: 3
Views: 1003
Reputation: 9964
The point of using segment-based field caches is to reduce the load time. If you want to get the value of a field after having added a new segment (which is done every time you commit), you only have to load a new field cache for the newly added segment.
This is not possible with ord and rord which give you the ordinal for the whole index instead of the value for a single document.
So the only solution for you would be to compute the boost based the value of the field "cur_rank" instead of its ord.
This is how date boosting now works : it used to use the rord of the date field in order to compute the boost, whereas it now uses the number of milliseconds between the value of the date field and now. See http://wiki.apache.org/solr/SolrRelevancyFAQ ("How can I boost the score of newer documents") for more details.
Upvotes: 3