Reputation: 2113
Say I am looking to match the string, "Bogata". I am looking for a regular expression or a short algorithm that would match either the anglicanized "Bogata" or the correct "Bogotá" or even a misspelled "Bógatá".
Similarly, if I am looking to match the string "Sao Paolo", I would want to match both "Sao Paolo" and "São Paolo".
My question is specific to javascript and the RegExp module but a more general solution would be preferable
Upvotes: 0
Views: 658
Reputation: 50787
There's a USENET thread archived by Google Groups that discussed some of the issues involved in supporting Unicode in a a regex extension. In there, Thomas 'PointedEars' Lahn mentioned his version, jsx.regexp, which I've never gotten around to analyzing in depth, but which on the surface looks pretty good. It might be useful to you.
Upvotes: 1