Calculation of QF and BF in Solr Edimax

Question

I'm trying to use both the qf and bf fields however, I can't comprehend the results of the score. May I know how the score is calculated?

Below is the query I've made: http://localhost:8983/solr/core name/select?bf=likes^0.8%20created_time^0.8&debugQuery=on&defType=edismax&indent=on&q=Samsung&qf=description^0.8&wt=json

And the score I've gotten

"rawquerystring":"Samsung"

"querystring":"Samsung"

"parsedquery":"(+DisjunctionMaxQuery(((description:samsung)^0.8)) FunctionQuery(date(created_time))^0.8 FunctionQuery(int(likes))^0.8)/no_coord"
"parsedquery_toString":"+((description:samsung)^0.8) (date(created_time))^0.8 (int(likes))^0.8"

"explain":{
  "23379598044_10154409629363045":"
1.19288942E12 = sum of:
  4.6113086 = weight(description:samsung in 61828) [SchemaSimilarity], result of:
    4.6113086 = score(doc=61828,freq=1.0 = termFreq=1.0
), product of:
      0.8 = boost
      4.2966447 = idf(docFreq=780, docCount=57329)
      1.3415436 = tfNorm, computed from:
        1.0 = termFreq=1.0
        1.2 = parameter k1
        0.75 = parameter b
        13.833522 = avgFieldLength
        5.2244897 = fieldLength
  1.19288942E12 = FunctionQuery(date(created_time)), product of:
    1.49111177E12 = date(created_time)=2017-04-02T05:43:00Z
    0.8 = boost
    1.0 = queryNorm
  22.4 = FunctionQuery(int(likes)), product of:
    28.0 = int(likes)=28
    0.8 = boost
    1.0 = queryNorm
",

Pavel Vasilev · Accepted Answer

Ok, let's decipher it - first things first and it's always easier to transform your explain into something more readable, e.g. replace into real carriage returns so it'd look like:

1.19288942E12 = sum of:
  4.6113086 = weight(description:samsung in 61828) [SchemaSimilarity], result of:
    4.6113086 = score(doc=61828,freq=1.0 = termFreq=1.0), product of:
      0.8 = boost
      4.2966447 = idf(docFreq=780, docCount=57329)
      1.3415436 = tfNorm, computed from:
        1.0 = termFreq=1.0
        1.2 = parameter k1
        0.75 = parameter b
        13.833522 = avgFieldLength
        5.2244897 = fieldLength
  1.19288942E12 = FunctionQuery(date(created_time)), product of:
    1.49111177E12 = date(created_time)=2017-04-02T05:43:00Z
    0.8 = boost
    1.0 = queryNorm
  22.4 = FunctionQuery(int(likes)), product of:
    28.0 = int(likes)=28
    0.8 = boost
    1.0 = queryNorm

Just to make it clear - this explain applies only for document with id= 23379598044_10154409629363045 from your result-set.

Let's break down the overall score of 1.19288942E12. As it says - it is sum of 3 parts:

4.6113086 = weight(description:samsung in 61828) - this part is strongly related to your query per se. Since your query is q=Samsung and qf=description^0.8 it will mean that if document will be matched into field description - then DisMax will boost the value by factor of 0.8 (here is reference for qf param). And this is what explain says: 4.6113086 ... is product of: 0.8 = boost, 4.2966447 = idf and 1.3415436 = tfNorm. Please note, that default scoring model in Solr < 6.0 is TF/IDF.
1.19288942E12 = FunctionQuery(date(created_time)) - this part is related to bf=created_time^0.8 part (I excluded likes^0.8 for sake of explanation - basically it will be covered in last bullet-point). What Solr (DisMax) is doing here - it takes the value of created_time for this particular document (2017-04-02T05:43:00Z), then converts it to UNIX timestamp (1.49111177E12) and then multiplies it by factor of 0.8. As you might notice 1.19288942E12 is quite big number if you compare it with initial TF/IDF. Usually it is practically inefficient and I would suggest using some normalizing function like reciprocal for such purposes.
22.4 = FunctionQuery(int(likes)) - this part is related to bf=likes^0.8. Basically Solr will also take the value of the field for this particular document (28) and multiply it by factor of 0.8. Here is my little note: if you know the distribution of the values for this field - that's pretty good - maybe this is useful in practice. But sometimes you never know your distribution and it's worth to normalize your field as well - apply some scale function for instance. I'm not insisting - just some piece of advice :)

Hopefully it should give you some hints for better 'explain' comprehension :)

Calculation of QF and BF in Solr Edimax

Answers (1)

Related Questions