Reputation: 5058
I am searching some phrase in Solr
of Name
field. I tryed different comfigurations for Name
, to be of type string
or any custom TextField
.
<fieldType name="string" class="solr.StrField" sortMissingLast="true"
docValues="true" />
<fieldType name="alphaOnlySort" class="solr.TextField"
sortMissingLast="true" omitNorms="true">
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.TrimFilterFactory"/>
<filter class="solr.PatternReplaceFilterFactory" replace="all"
replacement="" pattern="([^a-z])"/>
</analyzer>
</fieldType>
I defined Name
like this:
then tried like string:
Also I tried different tokenizers and filters combinations without success.
This is what I want: I have phrase 'test split' and I have some entries that have Name 'test'
, 'test 124'
,'testblablabla'
and 'test split 124'
.
What I find out is that 'test'
entry is first match in my example,and 'test split' has much much lower ranking altought it has more matching letters.
Why is that??
I am testing using solr admin interface and my q (query) parametar is:
Name:
*
test split*
EDIT 1:
I also tryed to create copyField called ExactName which has this configuration:
<fieldType name="exact" class="solr.TextField">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
and I search like this:
Name:*test split* OR (ExactName:*test split*)^5.0
Still 'test' comes much before 'test split' :(
Upvotes: 3
Views: 7292
Reputation: 1114
First of all, what do you want ? Do you want to return only results for your phrase ? or boost more phrase matches in comparison to other types of matches ?
The edismax (and its properties) are probably your solution. You can play with the mm parameter ( configuring the minimum match for clauses) and the pf( which boost phrases match) . [1]
If you just want the phrase to match : "test split" query should do the trick. Don't use * wildcard queries, use a proper analysis to split the tokens, wildcard queries are very inefficient in general.
[1] https://lucene.apache.org/solr/guide/6_6/the-extended-dismax-query-parser.html
Upvotes: 3
Reputation: 657
Your thoughts to solve this problem is actually correct. There are multiple ways to do this. It is possible to solve this at query-time by boosting span queries
, but more efficient is to do this also at indexing time.
What often is done for name searching is indeed boosting phrases. You could add a filter in the exact
fieldType. Checkout shingles
with the Shingle Filter with a default of minShingleSize
of 2
. Shingles are token n-grams.
You could create a fieldType without lowercasing as well by adding an extra copyField
and also with the Shingle Filter
.
Then boosting the fields is the next step. If you use the eDisMax query parser, you could use the bf
parameter to boost the fields:
Upvotes: 1