Reputation: 83
I have a df
with various names, many of which contain accented/non-English characters. I used gsub
's for each of the characters I wanted to replace, and that worked for many of them; however, for several of the characters, it did not replace them at all.
An example of the non-working gsub
: gsub("č","c",df,fixed=TRUE)
Here are the characters that were not replaced: ł ř ń š ž Ľ ţ ę č ć
My wish is to replace them with their English "look-alike" equivalent: l r n s z L t e c c
In addition to the gsub
attempts, I have also tried using chartr("łřńšžĽţęčć","lrnszLtecc",df$Name)
. Like the gsub
attempts, this ended in failure as well.
df<-data.frame(Name=c("Stipe Miočić","Duško Todorović","Michał Oleksiejczuk","Jiři Prochazka","Bartosz Fabiński","Damir Hadžović","Ľudovit Klein","Diana Belbiţă","Joanna Jędrzejczyk" ))
Above is a df
with several of the names that are giving me trouble, the problem is, when you run this and view the resulting df
it removes all of the characters that are giving me problems and shows English versions of those characters. However, it does not do this in my main df
I'm working on with directly scraped data.
Any insight into this problem and how to resolve it would be greatly appreciated.
Upvotes: 2
Views: 407
Reputation: 18611
Use stringi::stri_trans_general
:
library(stringi)
df<-data.frame(Name=c("Stipe Miočić","Duško Todorović","Michał Oleksiejczuk","Jiři Prochazka","Bartosz Fabiński","Damir Hadžović","Ľudovit Klein","Diana Belbiţă","Joanna Jędrzejczyk" ))
stri_trans_general(df$Name, "Latin-ASCII")
Results:
[1] "Stipe Miocic" "Dusko Todorovic" "Michal Oleksiejczuk"
[4] "Jiri Prochazka" "Bartosz Fabinski" "Damir Hadzovic"
[7] "Ludovit Klein" "Diana Belbita" "Joanna Jedrzejczyk"
See R proof.
Upvotes: 2
Reputation: 24790
You can use stringi::replace_all_fixed
:
library(stringi)
stri_replace_all_fixed(df$Name,
c("ł","ř","ń","š","ž","Ľ","ţ","ę","č","ć"),
c("l","r","n","s","z","L","t","e","c","c"),
vectorize_all = FALSE)
[1] "Stipe Miocic" "Dusko Todorovic" "Michal Oleksiejczuk" "Jiri Prochazka" "Bartosz Fabinski"
[6] "Damir Hadzovic" "Ludovit Klein" "Diana Belbită" "Joanna Jedrzejczyk"
Upvotes: 1