alex.bour
alex.bour

Reputation: 2964

Fuzzy search with Solr and sunspot

I have installed Solr and the Sunspot gem for my Rails 3.0 app.

My goal is to do fuzzy search. For example, I want the search term "Chatuea Marguxa" be found as "Château Margaux".

Actually, only the same exact words are found, so fuzzy didn't work at all.

My model:

  searchable do
    text :winery
  end 

My controller:

   search = Wine.search do
     fulltext 'Chatuea Marguxa'
   end 

The solr schemas I tried, with ngrams:

<fieldType name="text" class="solr.TextField" omitNorms="false">
  <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15"/>
  </analyzer>

I also tried with double metaphone:

<analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StandardFilterFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.PorterStemFilterFactory"/>
        <filter class="solr.PhoneticFilterFactory" encoder="DoubleMetaphone" inject="true"/>
      </analyzer>

In both cases, I got 0 response. (after reindexing of course).

What I did wrong ?

Upvotes: 3

Views: 1201

Answers (2)

PeterMacko
PeterMacko

Reputation: 882

try to add character '~' behind all word in query. Like this: Chatuea~ Marguxa~. This is fuzzy operator implemented in lucene: http://lucene.apache.org/core/3_6_0/queryparsersyntax.html#Fuzzy%20Searches

Upvotes: 1

Abram
Abram

Reputation: 41874

some searching around revealed fuzzily gem:

Anecdotical benchmark: against our whole Geonames-derived table of locations (3.2M records, about 1GB of data), on my development machine (a 2011 MacBook Pro)

searching for the top 10 matching records takes 6ms ±1 preparing the index for all records takes about 10min the DB query overhead when changing a record is at 3ms ±2 the memory overhead (footprint of the trigrams table index) is about 300MB

Upvotes: 0

Related Questions