Reputation: 357
I want to convert some city names with accented characters to normal strings. For example:
<<"Sosúa">> to <<"Sosua">>
<<"Luperón">> to <<"Luperon">>
Any leads on how to do this?
Upvotes: 3
Views: 373
Reputation: 4010
\p{Mn}
, replace (re:replace/4) all those combining diacritics (non-spacing marks) like U+301 aboveString = "Luperón",
{ok, Re} = re:compile("\\p{Mn}", [unicode]),
Output = unicode:characters_to_nfc_binary(
re:replace(
unicode:characters_to_nfd_binary(String),
Re,
"",
[global]
)
),
Output.
Equivalent for Elixir, for reference and information (as it is also based on Erlang's unicode module):
string = "Luperón"
output =
Regex.replace(~R<\p{Mn}>u, string |> :unicode.characters_to_nfd_binary(), "")
|> :unicode.characters_to_nfc_binary()
Upvotes: 4