RHSeeger
RHSeeger

Reputation: 16262

Adding date boosting to complex SOLR queries

I currently have a SOLR query which uses the query (q), query fields (qf) and phrase fields (pf) to retrieve the results I want. An example is:

/solr/select
?q=superbowl
&qf=title^3+headline^2+intro+fulltext
&pf=title^3+headline^2+intro+fulltext
&fl=id,title,ts_modified,score
&debugQuery=true

The idea is that the title and headline of the "main item" give the best indication of what the result is "about", but the intro and fulltext provides some input too. Ie, imagine a collection of links, where the collection itself has metadata (what it's a collection of), but each link has it's own data (title of the link, synopsis, etc). If we search for "superbowl", the most relevant results are the ones with "superbowl" in the collection metadata, the least relevant results are those with "superbowl" in just the synopsis of one of the links... but they're all valid results.

What I'm trying to do is add a boost to the relevancy score so that the most recent results float towards the top, but retaining title,headline,intro,fulltext as part of the formula. A recent result with the search string in the collection metadata would be more relevant than one with it only in the links metadata... but that "links only" recent result might be more relevant than a very old result with the search string in the collection metadata. (I hope that's somewhat clear).

The problem is that I can't figure out how to combine the boost function documented on the SOLR site with the use of the qf/pf fields. Specifically...

From the SOLR site, something like the following works to boost the results by date:

/solr/select
?q={!boost%20b=$dateboost%20v=$qq}
&dateboost=ord(ts_modified)
&qq=superbowl
&fl=ts_modified,score
&debugQuery=true

However, I can't figure out how to combine that query with the use of qf and pf. Any suggestions would be more than welcome.

Thanks to danben's response, I was able to come up with the following:

/solr/select
?q={!boost%20b=$dateboost%20v=$qq%20defType=dismax}
&dateboost=ord(ts_modified)
&qq=superbowl
&qf=title^3+headline^2+intro^2+fulltext
&pf=title^3+headline^2+intro^2+fulltext
&fl=ts_modifieds,score
&debugQuery=true

It looks like the actual problems I was having were:

Upvotes: 4

Views: 8951

Answers (2)

kenorb
kenorb

Reputation: 166507

Here is a nice article about Date-boosting Solr search results:

http://www.metaltoad.com/blog/date-boosting-solr-drupal-search-results


In Drupal this can be simply achieved by the following code:

using Apachesolr module

/**
 * Implements hook_apachesolr_query_alter().
 */
function hook_search_apachesolr_query_alter(DrupalSolrQueryInterface $query) {
  $query->addParam('bf', array('freshness' =>
    'recip(abs(ms(NOW/HOUR,dm_field_date)),3.16e-11,1,.1)'
  ));
}

Upvotes: 0

danben
danben

Reputation: 83250

Check out http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents

This is based on the ms function, which returns the difference in milliseconds between two timestamps / dates, and ReciprocalFloatFunction which increases as the value passed decreases.

Since you are using the DisMaxRequestHandler, you may need to specify your query using the bq/bf parameters. From http://lucene.apache.org/solr/api/org/apache/solr/handler/DisMaxRequestHandler.html:

bq - (Boost Query) a raw lucene query that will be included in the users query to influence the score. If this is a BooleanQuery with a default boost (1.0f), then the individual clauses will be added directly to the main query. Otherwise, the query will be included as is. This param can be specified multiple times, and the boosts are are additive. NOTE: the behaviour listed above is only in effect if a single bq paramter is specified. Hence you can disable it by specifying an additional, blank, bq parameter.

bf - (Boost Functions) functions (with optional boosts) that will be included in the users query to influence the score. Format is: "funcA(arg1,arg2)^1.2 funcB(arg3,arg4)^2.2". NOTE: Whitespace is not allowed in the function arguments. This param can be specified multiple times, and the functions are additive.

Upvotes: 4

Related Questions