Reputation: 1104
I have a Rails 3 app that contains Article objects. They have a title attribute. Before adding a new article, people are supposed to search to see if it an article with the title already exists.
Today someone reported a duplicate article. Turns out whoever added it had searched for it first, but there was an umlaut over an "o" in the title. They searched without the umlaut using a regular "o" character, didn't find it, and added the duplicate.
I'm doing a simple find on the title attribute with a scope, as below:
scope :search, lambda { |term| where('title like ?', "%#{term}%") }
I'm wondering if there's a simple way to "ignore" diacritics, so that the person could type an "o" and still find an article if the o has an umlaut, and the same for other diacritics.
I've considered creating a search_title attribute and populating it myself on update replacing the diacritics with their plain equivalents, but that has its own problems, among them, what if someone then does use the diacritic.
I was hoping there might be an easy solution for this, but I'm not holding out much hope. :-)
Upvotes: 0
Views: 221
Reputation: 49104
Yes, a standard way to handle this is to maintain a shadow search field. In addition to changing all the data to Ascii, consider:
An alternative strategy is to compute and search based on the Soundex score. (Or use a revised version of Soundex). There are Ruby libraries for Soundex or write your own.
Soundex will give you more false positives--you need to determine if you'd rather have more false positives or perhaps miss a match (a false negative) because one title was "Plague" and the other was "Plagues"
You could also install a real full-text search system, either by turning on the MySQL system or via a separate system.
Upvotes: 1
Reputation: 1533
I suggest to create a search_title field and store there title.to_ascii_brutal (use this plugin: https://github.com/tomash/ascii_tic). And then change your search scope to:
scope :search, lambda { |term| where('search_title like ?', "%#{term.to_ascii_brutal}%") }
Upvotes: 1