Reputation: 1698
In synonyms.txt, I have :
you're => you are
When looking at what gives the analysis tool on "Because you're mine", it is expanded in "Because you mine are", which is fine for a fulltext search, but is a big problem for the shingles. I wondered if the expanded wasn't put at the end, but "you're Because mine" is expanded into "you because are mine", the following word is inserted in between. I also tested "Because mine you're" which is expanded into "Because mine you are".
Any idea about why this may happen?
Here's screen cap of analysis tool to make it 100% clear:
Upvotes: 1
Views: 671
Reputation: 556
You can use Synonym-Expanding EDisMax Parser, which will add synonyms before doing text analysis: https://github.com/healthonnet/hon-lucene-synonyms
Upvotes: 0
Reputation: 1420
query section in schema:
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="0" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.DictionaryCompoundWordTokenFilterFactory" dictionary="wordlists/english-common-nouns.txt" minWordSize="5" minSubwordSize="4" maxSubwordSize="15" onlyLongestMatch="true"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<!-- this filter can remove any duplicate tokens that appear at the same position - sometimes
possible with WordDelimiterFilter in conjuncton with stemming. -->
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
I just let WDF do its tokenization, you're => you re. In the synonyms.txt I defined:
you re => you are
which is not the most elegant way, but it works, i.e. stores tokens in the order you need.
Upvotes: 2