Reputation: 21
QA site: snknop38we.azurewebsites.net/
Example of query: Solr: GETting 'q=(111 AND (published:True) AND ((entity_type_id:19)) AND ((available_start_date_time_utc : [* TO NOW]) OR (: -available_start_date_time_utc : [* TO *])) AND ((available_end_date_time_utc : [NOW TO ]) OR (:* -available_end_date_time_utc : [* TO *]))), start=0, rows=20, qf=name short_description published=true is_out_of_stock=false, hl=true, hl.fl=name,short_description' from '/spell'
Expected results: VM11110xl Kramer
Current results:
Scheme type for name & short description fields
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" multiValued="true">
<analyzer type="index">
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-FoldToASCII.txt"/>
<charFilter class="solr.HTMLStripCharFilterFactory"/>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ru.txt"/>
<filter class="solr.LowerCaseFilterFactory"/>
<!--<filter class="solr.SnowballPorterFilterFactory" language="Russian" protected="lang/protwords_lt.txt"/>-->
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
<analyzer type="query">
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-FoldToASCII.txt"/>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ru.txt"/>
<!--<filter class="solr.SynonymFilterFactory" synonyms="lang/synonyms_ru.txt" ignoreCase="true" expand="true"/>-->
<filter class="solr.LowerCaseFilterFactory"/>
<!--<filter class="solr.SnowballPorterFilterFactory" language="Russian" protected="lang/protwords_ru.txt"/>-->
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
</fieldType>
How do we need to modify our scheme to support numbers search? Also we don't want to lose current search features
Upvotes: 0
Views: 1404
Reputation: 52802
The main issue is that you want to match a substring of the token, so depending on exactly what you want to implement, adding an NGramFilter to the chain can be a solution. You'll have to tweak the values to get the hit ratio you're looking for, as it will also match "110" - depending on how you're structuring the data.
If you only want to match the start of each token, you can either use the EdgeNgramfilter, or use a wildcard search string (field:111*
) (but remember that that might disable other parts of the token processing, so you're probably better off with an edgengramfilter in that case).
In both cases you'll only want to add the ngramfilter when indexing, not when querying.
Upvotes: 1
Reputation: 12830
Use the below Schema :
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" multiValued="true">
<analyzer type="index">
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-FoldToASCII.txt"/>
<charFilter class="solr.HTMLStripCharFilterFactory"/>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ru.txt"/>
<filter class="solr.LowerCaseFilterFactory"/>
<!--<filter class="solr.SnowballPorterFilterFactory" language="Russian" protected="lang/protwords_lt.txt"/>-->
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
<analyzer type="query">
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-FoldToASCII.txt"/>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ru.txt"/>
<!--<filter class="solr.SynonymFilterFactory" synonyms="lang/synonyms_ru.txt" ignoreCase="true" expand="true"/>-->
<filter class="solr.LowerCaseFilterFactory"/>
<!--<filter class="solr.SnowballPorterFilterFactory" language="Russian" protected="lang/protwords_ru.txt"/>-->
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
</fieldType>
I have used WordDelimiterFilterFactory. It split word into subword by the following rules.
Source : http://www.pathbreak.com/blog/solr-text-field-types-analyzers-tokenizers-filters-explained
Upvotes: 0