Petar Yakov
Petar Yakov

Reputation: 179

Fuzzy search a part of the whole text in Solr

I have the following field declaration for my Solr index:

<field name="description" type="text_ci" indexed="true" multiValued="false" required="true"/>

Field type:

<fieldType name="text_ci" class="solr.TextField" omitNorms="true" sortMissingLast="true">
    <analyzer type="index">
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
</fieldType> 

In this index I have documents, where description value is like "Accomodation in {city}" (they all have different cities)

I want to make a fuzzy search and when I enter misspelled *acomodation*~2 for example to get results, but I find it difficult, because "accomodation" is just a part of the text.

I am thinking of using NGramFilter to tokenize the input, but I am not sure if this is the right way and how to implement it.

Do you know, what I can do?

Upvotes: 1

Views: 1647

Answers (1)

Abhijit Bashetti
Abhijit Bashetti

Reputation: 8658

Lucene supports fuzzy searches based on the Levenshtein Distance, or Edit Distance algorithm. To do a fuzzy search use the tilde, "~", symbol at the end of a Single word Term.

I don't see a need of NGramFilter here.

~ operator is used to run fuzzy searches. You need to add ~ operator after every single term and can also specify edit distance which is optional after that as below.

{FIELD_NAME:TERM_1~{Edit_Distance}

Your request will look like below.

http://localhost:8983/solr/FuzzySearchExample/select?indent=on&q=desc:Samsu~&wt=json&fl=id,desc

I had the field type as below.

<fieldType name="text_ci" class="solr.TextField" omitNorms="true" sortMissingLast="true">
    <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
</fieldType>

I get the below response for acomodation~2 or acomodation~1

Screenshot od solr query page

And I get the below response for acomodation.

Screenshot of query page2

Upvotes: 2

Related Questions