Paul
Paul

Reputation: 429

Solr Queries With Dashes

I am currently using solr edismax to do searches on our website. What I'm looking to do, is essentially have dashes get ignored.

So if I search the words, "wi-fi adapter". And I have a document, with a title, "wifi adapter". I'll get no results.

I am currently using solr.MappingCharFilterFactory to map dashes to spaces. This is what my text_general fieldtype looks like in my schema.

  <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" multiValued="true">
    <analyzer type="index">
      <tokenizer class="solr.ClassicTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
      <charFilter class="solr.MappingCharFilterFactory" mapping="mapping.txt"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <charFilter class="solr.HTMLStripCharFilterFactory"/>
    </analyzer>
    <analyzer type="query">
      <tokenizer class="solr.ClassicTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
      <charFilter class="solr.MappingCharFilterFactory" mapping="mapping.txt"/>
      <filter class="solr.SynonymGraphFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <charFilter class="solr.HTMLStripCharFilterFactory"/>
    </analyzer>
  </fieldType>

My mapping.txt contains the line..

"-" => " "

So what this rule does, is it converts the dashes to a space.

So if I search "wi fi adapter", it will always show the same results as "wi fi adapter", but won't show results for "wifi adapter".

Is there any way to treat dashes like this? Essentially I'd want to treat "wifi adapter", "wi-fi adapter", and "wi fi adapter" the same.

Upvotes: 2

Views: 1074

Answers (1)

Abhijit Bashetti
Abhijit Bashetti

Reputation: 8658

You can use the WordDelimiterGraphFilterFactory for your analyzer. It has lot many attributes that could be used. I have listed few.

The WordDelimiterGraphFilterFactory has many attributes.

generateWordParts : (integer, default 1) If non-zero, splits words at delimiters. For example: "CamelCase", "hot-spot" → "Camel", "Case", "hot", "spot"

preserveOriginal : (integer, default 0) If non-zero, the original token is preserved: "Zap-Master-9000" → "Zap-Master-9000", "Zap", "Master", "9000"

catenateWords : (integer, default 0) If non-zero, maximal runs of word parts will be joined: "hot-spot-sensor’s" → "hotspotsensor"

So in your case it would be like

<fieldType name="text_wd" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
          <!-- Splits words based on whitespace characters --> 
          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
          <!-- splits words at delimiters based on different arguments --> 
          <filter class="solr.WordDelimiterGraphFilterFactory" preserveOriginal="1" catenateWords="1"/>
          <!-- Transforms text to lower case -->   
          <filter class="solr.LowerCaseFilterFactory"/>
        </analyzer>

        <analyzer type="query">
          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
          <filter class="solr.LowerCaseFilterFactory"/>
        </analyzer>
  </fieldType>

The more information on it would be found at Fiters available in solr

Upvotes: 4

Related Questions