Rahib Rasheed
Rahib Rasheed

Reputation: 357

How to convert accented strings to regular strings in Erlang?

I want to convert some city names with accented characters to normal strings. For example:

<<"Sosúa">>  to  <<"Sosua">>

<<"Luperón">> to <<"Luperon">>

Any leads on how to do this?

Upvotes: 3

Views: 373

Answers (1)

julp
julp

Reputation: 4010

  1. apply an Unicode Canonical Decomposition (NFD) to rewrite characters like ó in the two code points o (U+6F) followed by a separated combining acute accent (U+301) with unicode:characters_to_nfc_binary/1
  2. with the regexp \p{Mn}, replace (re:replace/4) all those combining diacritics (non-spacing marks) like U+301 above
  3. optional: apply an Unicode Canonical Composition (NFC) to recompose back the remaining and possible code points together
String = "Luperón",
{ok, Re} = re:compile("\\p{Mn}", [unicode]),
Output = unicode:characters_to_nfc_binary(
  re:replace(
    unicode:characters_to_nfd_binary(String),
    Re,
    "",
    [global]
  )
),
Output.

Equivalent for Elixir, for reference and information (as it is also based on Erlang's unicode module):

string = "Luperón"
output = 
  Regex.replace(~R<\p{Mn}>u, string |> :unicode.characters_to_nfd_binary(), "")
  |> :unicode.characters_to_nfc_binary()

Upvotes: 4

Related Questions