Reputation: 2270
I have indexed in solr shop names like
H&M
Lotte & Anna
fan & more
Tele2
Pure Tea
I have the following two issues (with importance in priority)
if I search for "H&M" I will never get any result. If I search for "te & Ann" I get the expected results.
if I search for "te & an" the results I get are Tele2 and Pure Tea whereas I would have expected "Lotte & Anna" to appear first in the list.
It appears as if the & character is not taken into consideration. What am I doing wrong here?
These are my analysers for the specific field (both query and index)
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" side="front"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
Ok, so the 1st problem was addressed with the WordDelimiterFilterFactory
specifying & => ALPHA
in the wdfftypes.txt
and changing switching from the StandardTokenizerFactory
to the WhitepsaceTokenizerFactory
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory" types="wdfftypes.txt"/>
(edited in both analyser and query).
2nd question still remains. In the debugQuery I get the following
"debug": {
"rawquerystring": "te & an",
"querystring": "te & an",
"parsedquery": "text:te text:an",
"parsedquery_toString": "text:te text:an",
"explain": {
"": "\n0.8152958 = (MATCH) product of:\n 1.6305916 = (MATCH) sum of:\n 1.6305916 = (MATCH) weight(text:te in 498) [DefaultSimilarity], result of:\n 1.6305916 = score(doc=498,freq=1.0 = termFreq=1.0\n), product of:\n 0.8202942 = queryWeight, product of:\n 5.300835 = idf(docFreq=87, maxDocs=6491)\n 0.15474811 = queryNorm\n 1.9878132 = fieldWeight in 498, product of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 5.300835 = idf(docFreq=87, maxDocs=6491)\n 0.375 = fieldNorm(doc=498)\n 0.5 = coord(1/2)\n"
},
so, what should I modify so that the weights shift in favour of the desired result?
Upvotes: 0
Views: 169
Reputation: 3105
Use "NGramFilterFactory" instead of "EdgeNGramFilterFactory". That way, "Lotte & Anne", gets indexed into "lo, ot, tt, te, lot, ott, tte, lott, otte, lotte" and "an, nn, ne, ann, nne, anne". so when you search for "tte & ann", the document will match.
Upvotes: 2