Aswin Raghavan
Aswin Raghavan

Reputation: 321

Remove punctuation's in solr search

I am new to the solr i have a document indexed in solr e.g

 {
    "foodType": "basicFood",
    "fulltext": [
      "basicFood",
      "3.718625",
      "1 tbsp",
      "Butter, salted"
    ],
    "slims": "3.718625",
    "displayText": "1 tbsp",
    "displayName": "Butter, salted"
  },

when i search for butter the result is null but it works fine for the query butter, how to make it working for butter also?

Upvotes: 2

Views: 3076

Answers (1)

YoungHobbit
YoungHobbit

Reputation: 13402

Add the following filter to your analyzer for both index and query phase.

<filter class="solr.PatternReplaceFilterFactory" pattern="([^A-Za-z0-9])" replacement="" replace="all"/>

This is will replace all the letters from token except a-z, A-Z and 0-9. To test this you might need to re-index your data, because your indexed data will have punctuations. or you can try in solr admin UI, analysis section.

Other approach, using a different tokenizer instead of StandardTokenizerFactory in the analyzer phase. You can use LetterTokenizerFactory which creates tokens consisting of strings of contiguous letters. Any non-letter characters will be discarded. But this can create many extra token which you might not want. Please check before you do.

Example: "I can't" ==> "I", "can", "t"

Update

If you need to use WordDelimiterFilter filter. Try the below configuration:

<filter class="solr.WordDelimiterFilter" generateWordParts="1" catenateWords="1" splitOnCaseChange="0"  generateNumberParts="0" splitOnCaseChange="0"/>

This will splits words at delimiters and concatenate them again. All the other splits are turned off like numeric strings, camel-case change and transitions from alpha to numeric. If required you can turn on by providing non-zero value.

Upvotes: 2

Related Questions