Reputation: 163
I use Solr server with version 6.4.1.
I need to search field which could contain spec symbols like -_.
. But at the same time, I need an opportunity to find the entity without those symbols.
For example, the value is G2-5SG
. I should find it by next queries: G2 5SG
, G2-5SG
, G25SG
.
I have following configuration for the type:
<analyzer type="index">
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="(\w+)([-_.\s])" replacement="$1"/>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="16"/>
</analyzer>
<analyzer type="query">
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="(\w+)([-_.\s])" replacement="$1"/>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
Search with spec symbols works fine. But when I try to search word without symbols server returns an empty set.
Values in analyzer are marked as satisfying, for index G2 5SG
and for query G25SG
.
Upvotes: 0
Views: 607
Reputation: 15771
one thing that would work would be:
Upvotes: 1
Reputation: 8658
You can use
<tokenizer class="solr.StandardTokenizerFactory"/>
This tokenizer splits the text field into tokens, treating whitespace and punctuation as delimiters.
instead of
<tokenizer class="solr.KeywordTokenizerFactory"/>
This tokenizer treats the entire text field as a single token.
You may try something like below.
<fieldtype name="subword" class="solr.TextField">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory"/>
<filter class="solr.FlattenGraphFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
</analyzer>
</fieldtype>
For more details please refer the tokenizer page Tokenizers
Upvotes: 0