Vaibhav Raut
Vaibhav Raut

Reputation: 95

How to query Solr to get the documents if it matches 50% of the query string?

I am using Solr 7.6 with the document structure is as follows:

{
    "source_ln":"en",
    "source_text":"the sky is blue",
    "target_ln":"hi",
    "target_text":"आसमान नीला है",
},
{
    "source_ln":"en",
    "source_text":"the sky is also called the celestial sphere",
    "target_ln":"hi",
    "target_text":"आकाश को आकाशीय क्षेत्र भी कहा जाता है",
}

All the fields are defined with the StandardTokenizerFactory tokenizer.

When I query "source_text":"the sky",

The result set should contain the first document only.

In the second document the field "source_text":"the sky is also called the celestial sphere" contains 8 terms and the query field "source_text":"the sky" contains the 2 terms only, So the at least 50% match criteria is not fulfilled and hence 2nd document would not be in the result set.

Is there any way to get the documents matching at least 50% of the query field terms/tokens?

Thanks in advance.

Upvotes: 1

Views: 369

Answers (2)

EricLavault
EricLavault

Reputation: 16045

You can set your request handler to use a (e)dismax query parser, for example using the defTypeparameter eg. ?q=...&defType=dismax.

Using a dismax parser, you can then use the mm (Minimum Should Match) parameter according to your needs, just by setting mm=50%.

Upvotes: 1

Ashutosh Tiwari
Ashutosh Tiwari

Reputation: 25

You can achieve the features by doing below steps.

  • Create separate field in your schema name "source_text_fifty", param(indexing=true, storing=false, and don't apply StandardTokenizerFactory grammar type or better create separate datatype field with solr.KeywordTokenizerFactory ).
  • Now, Calculate 50% of your input during Indexing the doc and store those calculated data in "source_text_fifty" field.
  • Re-index all exiting data with above logic.
  • Run query with source_text_fifty:"the sky". Now you got only one 50% match data.

Upvotes: 0

Related Questions