Jared
Jared

Reputation: 2806

solr score document that has all terms the same regardless of frequency of terms

I have a requirement on how results should be coming back sorted from solr. At a high level they should look like this:

Currently I am sorting on solr score and then date. When I query solr I am using a boost function that gives an inverse boost to older documents so they get moved down and newer documents 'float' to the top as well I am boosting the appropriate fields so that I do get exact, partial, and fuzzy matches in the correct order. This has gotten me most of the way there.

Now for the tricky part. The requirement states that if I search for something like 'red ford truck' the documents that contain 'red ford truck', regardless of the frequency of the terms, should be scored the same. The boost newer docs to the top doesn't effect the score enough to push documents with a higher term frequency down far enough.

For example let's say I have 2 documents: doc 1:

doc 2:

When I search for 'red ford truck' I want document 2 to appear first because it is newer and has all the queried terms. Currently document 1 will appear first because it has more matches in Field1 and the inverse boost doesn't do enough to push it down.

So now for my question is there a configuration point in solr to tell it to match on queried terms exactly once for a document? Kind of like an Exists in T-SQL.

If there is any other information that would be helpful let me know and thank you for your time in advance.

Upvotes: 2

Views: 1482

Answers (1)

javanna
javanna

Reputation: 60195

Those scores are different because of both the terms frequency and the length of the field.

omitNorms seems what you're looking for regarding the length of the field. Have a look at this previous answer, and remember that index-time boosting will be disabled too for that field:

If true, omits the norms associated with this field (this disables length normalization and index-time boosting for the field, and saves some memory).

omitTermFreqAndPositions seems what you're looking for regarding the term frequency:

If true, omits term frequency, positions, and payloads from postings for this field. This can be a performance boost for fields that don't require that information. It also reduces the storage space required for the index. Queries that rely on position that are issued on a field with this option will silently fail to find documents. This property defaults to true for all fields that are not text fields.

Upvotes: 2

Related Questions