alex.bour
alex.bour

Reputation: 2964

Rails : what rails fuzzy method/gem/plugin use to search in a 1 million records database table?

I own a ~1 million records MySQL table. I will need soon to add search in my Rails 3.x app. I want the search to be fuzzy.

Actually, I use a plugin (rails-fuzzy-search) for another table but it's only 3000 records. This plugin create trigrams in another table (25000 trigrams for the 3000 records table).

Well, I can't use this method for my 1 million records table else my trigrams table will be maybe 100 millions records !

I see some gems:

I don't know what is the best solution for better performances.

The search will be set for two fields of my table.

Upvotes: 1

Views: 1419

Answers (2)

David Cook
David Cook

Reputation: 563

Depending on your database, you may be able to enable database-specific features to achieve some "fuzzy matching" (whatever that means exactly).

MySQL has "Natural Language Mode", which is explained here.

This is fairly easy to implement. First add an index in a migration like so:

add_index :products, [:name, :description], type: :fulltext, name: :fulltext_index_products_on_name_and_description

Then you can use the MATCH clause in your query like so:

scope :search, ->(query) {
  where("MATCH (products.name, products.description) AGAINST (? IN NATURAL LANGUAGE MODE)", query)
  # MySQL should sort by relevance, so make sure not to override this with your own order clause.
}

Be aware you may get unexpected results testing on a small dataset, it's best to try it with full data. You may also find "BOOLEAN MODE" worth looking into.

Upvotes: 0

Abram
Abram

Reputation: 41884

some searching around revealed fuzzily gem:

Anecdotical benchmark: against our whole Geonames-derived table of locations (3.2M records, about 1GB of data), on my development machine (a 2011 MacBook Pro)

searching for the top 10 matching records takes 6ms ±1 preparing the index for all records takes about 10min the DB query overhead when changing a record is at 3ms ±2 the memory overhead (footprint of the trigrams table index) is about 300MB

Also, check out Solr and Sunspot

Upvotes: 2

Related Questions