Muss Mesmari
Muss Mesmari

Reputation: 133

Why don't I get results when I search for partial words in Solr?

I am trying to figure out the correct order for my analyzer in Solr but I do get no results when I search for partial words. For instance:

Query: Sto

Desired results: Stockholm

Query: Sweden is

Desired results: Sweden is a European city

I only receive results when I search for the whole meaning, the desired results. I would be thankful for any hints or tips for what may be wrong with what I have done so far

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" multiValued="true">
        <analyzer type="index">
            <tokenizer class="solr.StandardTokenizerFactory"/>
            <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
            <filter class="solr.LengthFilterFactory" min="2" max="15"/>
            <filter class="solr.PorterStemFilterFactory"/>
            <filter class="solr.FlattenGraphFilterFactory"/>
            <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15"/>
            <filter class="solr.LowerCaseFilterFactory"/>
        </analyzer>
        <analyzer type="query">
            <tokenizer class="solr.StandardTokenizerFactory"/>
            <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
            <filter class="solr.LengthFilterFactory" min="2" max="15"/>
            <filter class="solr.PorterStemFilterFactory"/>
            <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
            <filter class="solr.LowerCaseFilterFactory"/>
        </analyzer>
    </fieldType>

Upvotes: 0

Views: 344

Answers (2)

Muss Mesmari
Muss Mesmari

Reputation: 133

I managed to find the missing piece. The mistake I’ve done is that I indexed my fields as a string. Fields that are indexed as a string are being indexed as a phrase without analyzing. Therefore, I couldn’t search for partial words/part of the string

Upvotes: -1

Abhijit Bashetti
Abhijit Bashetti

Reputation: 8678

Your are tokenizing the text by applying field type text general.

In order to get the partial word match you have to change the tokenizer.

Try using ngram tokenizer in this case. N-Gram Tokenizer.

Reads the field text and generates n-gram tokens of sizes in the given range.

Factory class: solr.NGramTokenizerFactory

Arguments:

minGramSize: (integer, default 1) The minimum n-gram size, must be > 0.

maxGramSize: (integer, default 2) The maximum n-gram size, must be >= minGramSize.

Example:

Default behavior. Note that this tokenizer operates over the whole field. It does not break the field at whitespace. As a result, the space character is included in the encoding.

<analyzer>
  <tokenizer class="solr.NGramTokenizerFactory"/>
</analyzer>

For the second case you will get the results but you are looking for the phrase match. For the text like that you need to use the text_general field type. Also try using the ediamx and check.

One more thing, you can verify your field type on the solr admin analysis page.

Upvotes: 2

Related Questions