dc10
dc10

Reputation: 2198

Solr does not find substring

I have a rails 4 application running sunspot solr with the following filters in the schema.xml

    <fieldType name="text" class="solr.TextField" omitNorms="false">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" side="front"/>
    <filter class="solr.NGramFilterFactory" minGramSize="1" maxGramSize="10"/>
    <filter class="solr.TrimFilterFactory" />
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.PorterStemFilterFactory"/>
    <filter class="solr.NGramFilterFactory" minGramSize="1" maxGramSize="10"/>
    <filter class="solr.TrimFilterFactory" />
  </analyzer>
</fieldType>

I have a city named "Alpe d'Huez" which I wan't to be found by solr. Solr only finds this record as long as you start typing Alpe, but I want it to be found by just typing in "huez" How can this be achieved? Thanks for help

Upvotes: 1

Views: 329

Answers (2)

femtoRgon
femtoRgon

Reputation: 33341

Right off, your analysis looks a bit questionable.

First, it's most typical for your query and index analyzers to be identical, or nearly so (this isn't a hard and fast rule, but if they diverge you should know why). If they are too different, the query terms won't match up well with the indexed terms, and you'll often get no results.

Using both EdgeNGramFilterFactory and NGramFilterFactory is pretty odd. Essentially, you split the tokens into ngrams, and split out ngrams from your ngrams. This doesn't strike me as particularly useful, unless you are really intending to take the shotgun blast approach to searching.

You are applying a stemmer (PorterStemFilterFactory) in your query time analysis, but not at index time. You're stemmer should be applied at both times for it to be useful.

Further, NGrams and Stemmers don't really play well together. If you need to use both, you should probably index them in different fields.

And minor point on TrimFilterFactory: it doesn't actually do anything here. You're using StandardTokenizer, so input is already being split on whitespace. TrimFilterFactory is rarely useful on anything but keyword analyzed fields.

If you aren't sure how you need to analyze, it might be most useful to just start with standard analysis :

<fieldType name="text" class="solr.TextField" omitNorms="false">
  <analyzer type="index" class="org.apache.lucene.analysis.standard.StandardAnalyzer"/>
  <analyzer type="query" class="org.apache.lucene.analysis.standard.StandardAnalyzer"/>
</fieldType>

And go from there.

Otherwise, something like this might be the closest to what you've provided that seems pretty reasonable:

<fieldType name="text" class="solr.TextField" omitNorms="false">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.NGramFilterFactory" minGramSize="1" maxGramSize="10"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.NGramFilterFactory" minGramSize="1" maxGramSize="10"/>
  </analyzer>
</fieldType>

Upvotes: 1

Rahul Sharma
Rahul Sharma

Reputation: 5824

Try below configuration and if search term contains special characters then enter search term within double quotes.

<fieldType name="search" class="solr.TextField" positionIncrementGap="150">
    <analyzer type="index">
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.NGramFilterFactory" minGramSize="1" maxGramSize="50"/>
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
</fieldType>

Upvotes: 2

Related Questions