Mpizos Dimitris
Mpizos Dimitris

Reputation: 4991

Convert special letters to english letters in R

Is there a way to convert special letters, in a text, to english letters in R? For example:

Æ -> AE
Ø -> O
Å -> A

Edit: The reason I need this convert is R cant see that these two words are the same:

stringdist('oversættelse','oversaettelse')
[1] 2
grepl('oversættelse','oversaettelse')
FALSE

Some people tent to write using only english characters and some others not. In order to compare some texts I need to have them in the 'same format'.

Upvotes: 4

Views: 2930

Answers (2)

bdecaf
bdecaf

Reputation: 4732

I recently had a very similar problem and was pointed to the question Unicode normalization (form C) in R : convert all characters with accents into their one-unicode-character form?

basically the gist is for many of this special characters there exist more than one unicode representation - which will mess with text comparisons. The suggested solution is to use the stringi package function stri_trans_nfc - it has also a function stri_trans_general that supports transliteration, which might be exactly what you need.

Upvotes: 7

Xavier Nayrac
Xavier Nayrac

Reputation: 570

You can use chartr

x <- "ØxxÅxx"
chartr("ØÅ", "OA", x)
[1] "OxxAxx"

And/or gsub

y <- "Æabc"
gsub("Æ", "AE", y)
[1] "AEabc"

Upvotes: -2

Related Questions