Reputation: 111
I am cleaning my dataset and removing all accents on letters and such. In order to do this I use gsub (see code below). It works perfectly fine but I am sure there is a more convenient way to do it. I've heard about gsubfn but I have not been able to figure out how it works. Any tips on that or on any other way to make this code more efficient?
Ech_final$lastname.y <- gsub(' ', "", Ech_final$lastname.y)
Ech_final$lastname.y <- gsub("'", "", Ech_final$lastname.y)
Ech_final$lastname.y <- gsub("/", "", Ech_final$lastname.y)
Ech_final$lastname.y <- gsub("Ë", "E", Ech_final$lastname.y)
Ech_final$lastname.y <- gsub("É", "E", Ech_final$lastname.y)
Ech_final$lastname.y <- gsub("È", "E", Ech_final$lastname.y)
Ech_final$lastname.y <- gsub("Ç", "C", Ech_final$lastname.y)
Ech_final$lastname.y <- gsub("À", "A", Ech_final$lastname.y)
Ech_final$lastname.y <- gsub("Ù", "U", Ech_final$lastname.y)
Ech_final$lastname.y <- gsub("Œ", "OE", Ech_final$lastname.y)
Ech_final$lastname.y <- gsub("Ï", "I", Ech_final$lastname.y)
Thanks!
Upvotes: 3
Views: 188
Reputation: 269885
Here are several alternatives. The first uses gsubfn
as requested by the Subject line of the question. The next two use iconv
or chartr
for the substitutions for which the replacement is one character and gsub
for the remaining replacements. The last two use gsub
for all substitutions using Reduce
or a loop.
1) gsubfn gsubfn
is used like this where the first argument says to match each character, the second is a translation list and the third is the input.
library(gsubfn)
x <- " '/ËÉÈÇÀÙŒÏ" # input
L <- list(" " = "", "'" = "", "/" = "", Ë = "E", É = "E", È = "E",
Ç = "C", À = "A", Ù = "U", Œ = "OE", Ï = "I")
gsubfn(".", L, x)
## [1] "EEECAUOEI"
2) gsub/iconv Another approach is a mix of gsub
for those cases where the replacement is not one character and iconv
. The input x
is defined above.
Encoding(x) # note the encoding
## [1] "latin1"
x |>
gsub(pattern = "Œ", replacement = "OE") |>
gsub(pattern = "[ '/]", replacement = "") |>
iconv("latin1", "ascii//translit")
## [1] "EEECAUOEI"
3) gsub/chartr A third alternative is the same as (2) except we use chartr
in place of iconv
. The input x
is defined above.
x |>
gsub(pattern = "Œ", replacement = "OE") |>
gsub(pattern = "[ '/]", replacement = "") |>
chartr(old = "ËÉÈÇÀÙÏ", new = "EEECAUI")
## [1] "EEECAUOEI"
4) gsub/Reduce This alternative performs the gsub
calls shown in the question but does it in a more compact form using Reduce
. The list L
and the input x
are from (1).
Reduce(function(s, nm) gsub(nm, L[[nm]], s), names(L), x)
## [1] "EEECAUOEI"
5) gsub/for One could use a simple for
loop to run the gsub
calls in the question. The list L
and the input x
are from (1).
xx <- x
for(nm in names(L)) xx <- gsub(nm, L[[nm]], xx)
xx
## [1] "EEECAUOEI"
Upvotes: 2