Reputation: 4991
Is there a way to convert special letters, in a text, to english letters in R? For example:
Æ -> AE
Ø -> O
Å -> A
Edit: The reason I need this convert is R cant see that these two words are the same:
stringdist('oversættelse','oversaettelse')
[1] 2
grepl('oversættelse','oversaettelse')
FALSE
Some people tent to write using only english characters and some others not. In order to compare some texts I need to have them in the 'same format'.
Upvotes: 4
Views: 2930
Reputation: 4732
I recently had a very similar problem and was pointed to the question Unicode normalization (form C) in R : convert all characters with accents into their one-unicode-character form?
basically the gist is for many of this special characters there exist more than one unicode representation - which will mess with text comparisons. The suggested solution is to use the stringi package function stri_trans_nfc
- it has also a function stri_trans_general
that supports transliteration, which might be exactly what you need.
Upvotes: 7
Reputation: 570
You can use chartr
x <- "ØxxÅxx"
chartr("ØÅ", "OA", x)
[1] "OxxAxx"
And/or gsub
y <- "Æabc"
gsub("Æ", "AE", y)
[1] "AEabc"
Upvotes: -2