Solr search query does not consider special character

Question

I have indexed in solr shop names like

H&M
Lotte & Anna
fan & more
Tele2
Pure Tea

I have the following two issues (with importance in priority)

if I search for "H&M" I will never get any result. If I search for "te & Ann" I get the expected results.
if I search for "te & an" the results I get are Tele2 and Pure Tea whereas I would have expected "Lotte & Anna" to appear first in the list.

It appears as if the & character is not taken into consideration. What am I doing wrong here?

These are my analysers for the specific field (both query and index)

Ok, so the 1st problem was addressed with the WordDelimiterFilterFactory specifying & => ALPHA in the wdfftypes.txt and changing switching from the StandardTokenizerFactory to the WhitepsaceTokenizerFactory

(edited in both analyser and query).

2nd question still remains. In the debugQuery I get the following

"debug": {
    "rawquerystring": "te & an",
    "querystring": "te & an",
    "parsedquery": "text:te text:an",
    "parsedquery_toString": "text:te text:an",
    "explain": {
      "": "
0.8152958 = (MATCH) product of:
  1.6305916 = (MATCH) sum of:
    1.6305916 = (MATCH) weight(text:te in 498) [DefaultSimilarity], result of:
      1.6305916 = score(doc=498,freq=1.0 = termFreq=1.0
), product of:
        0.8202942 = queryWeight, product of:
          5.300835 = idf(docFreq=87, maxDocs=6491)
          0.15474811 = queryNorm
        1.9878132 = fieldWeight in 498, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          5.300835 = idf(docFreq=87, maxDocs=6491)
          0.375 = fieldNorm(doc=498)
  0.5 = coord(1/2)
"
    },

so, what should I modify so that the weights shift in favour of the desired result?

Socratees Samipillai · Accepted Answer

Use "NGramFilterFactory" instead of "EdgeNGramFilterFactory". That way, "Lotte & Anne", gets indexed into "lo, ot, tt, te, lot, ott, tte, lott, otte, lotte" and "an, nn, ne, ann, nne, anne". so when you search for "tte & ann", the document will match.

Solr search query does not consider special character

Answers (1)

Related Questions