Vlado Pandžić
Vlado Pandžić

Reputation: 5058

How to search phrase in Solr?

I am searching some phrase in Solr of Name field. I tryed different comfigurations for Name, to be of type string or any custom TextField.

  <fieldType name="string" class="solr.StrField" sortMissingLast="true" 
  docValues="true" />
  <fieldType name="alphaOnlySort" class="solr.TextField" 
sortMissingLast="true" omitNorms="true">
     <analyzer>
       <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.TrimFilterFactory"/>
       <filter class="solr.PatternReplaceFilterFactory" replace="all" 
  replacement="" pattern="([^a-z])"/>
     </analyzer>
 </fieldType>

I defined Name like this:

then tried like string:

Also I tried different tokenizers and filters combinations without success.

This is what I want: I have phrase 'test split' and I have some entries that have Name 'test', 'test 124','testblablabla' and 'test split 124'. What I find out is that 'test' entry is first match in my example,and 'test split' has much much lower ranking altought it has more matching letters. Why is that??

I am testing using solr admin interface and my q (query) parametar is: Name:*test split*

EDIT 1:

I also tryed to create copyField called ExactName which has this configuration:

 <fieldType name="exact" class="solr.TextField">
    <analyzer>
     <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer> 
 </fieldType>

and I search like this:

Name:*test split* OR (ExactName:*test split*)^5.0 

Still 'test' comes much before 'test split' :(

Upvotes: 3

Views: 7292

Answers (2)

Alessandro Benedetti
Alessandro Benedetti

Reputation: 1114

First of all, what do you want ? Do you want to return only results for your phrase ? or boost more phrase matches in comparison to other types of matches ?

The edismax (and its properties) are probably your solution. You can play with the mm parameter ( configuring the minimum match for clauses) and the pf( which boost phrases match) . [1]

If you just want the phrase to match : "test split" query should do the trick. Don't use * wildcard queries, use a proper analysis to split the tokens, wildcard queries are very inefficient in general.

[1] https://lucene.apache.org/solr/guide/6_6/the-extended-dismax-query-parser.html

[2] https://lucene.apache.org/solr/guide/6_6/the-dismax-query-parser.html#TheDisMaxQueryParser-Thepf_PhraseFields_Parameter

Upvotes: 3

drjz
drjz

Reputation: 657

Your thoughts to solve this problem is actually correct. There are multiple ways to do this. It is possible to solve this at query-time by boosting span queries, but more efficient is to do this also at indexing time.

What often is done for name searching is indeed boosting phrases. You could add a filter in the exact fieldType. Checkout shingles with the Shingle Filter with a default of minShingleSize of 2. Shingles are token n-grams.

You could create a fieldType without lowercasing as well by adding an extra copyField and also with the Shingle Filter.

Then boosting the fields is the next step. If you use the eDisMax query parser, you could use the bf parameter to boost the fields:

  • Case-sensitive (no lower-casing) + shingles has highest boost
  • Case-insensitive (with lower-casing) + shingles with 2nd highest boost
  • Standard field without boost.

Upvotes: 1

Related Questions