Aidan Clint
Aidan Clint

Reputation: 37

Match specific letter in any language

I've been searching for an answer to this question for almost an hour now, so I thought I would finally ask. I know you can use \p{L} to match any kind of letter from any language, but I haven't encountered any way to match a letter from any language that is the equivalent of some English letter, like the letter a.

For example, ideally I'd want to match any equivalent of "a" or "A" in any language (like: Å, å, Ǻ, ǻ, Ḁ, ḁ, ẚ, Ă, etc...)

Upvotes: 2

Views: 56

Answers (1)

Rodrigo Rodrigues
Rodrigo Rodrigues

Reputation: 8576

Your best shot is something like:

myStr.normalize("NFD").replace(/\p{Diacritic}/gu, "")

It uses normalize(), and needs ES6. For example:

myStr = "ÁàÀăĂắẮâÂåÅǻǺäÄãÃąĄāĀȃȂḁḀćčçÇéÉèÈêÊěëËḝęēĒȇȆíÍìÌîÎïÏīĪȋȊľńñÑóÓòÒôÔöÖõÕōŌȏȎŘȓȒśšŠşŞșȘţŢțȚúÚùÙûÛůŮüÜűũŨūŪȗȖẘýÝẙÿŸȳȲźžŽż"
myStr.normalize("NFD").replace(/\p{Diacritic}/gu, "")
// "AaAaAaAaAaAaAaAaAaAaAaAaAcccCeEeEeEeeEeeeEeEiIiIiIiIiIiIlnnNoOoOoOoOoOoOoORrRssSsSsStTtTuUuUuUuUuUuuUuUuUwyYyyYyYzzZz"

It works even for accented non-latin letters like "ᾰᾸӑӐёӂӁйЙ" and weirder things, like many letters sharing accent b͝g.

On what it does not work? Well, for things like (LATIN SMALL LETTER A WITH RIGHT HALF RING).

The reason? Its compatibility decomposition uses U+02BE ʾ (MODIFIER LETTER RIGHT HALF RING), which is outside the range of U+0300-U+036F that p{Diacritic} uses. (Check here for reference)

Upvotes: 2

Related Questions