David Potrel
David Potrel

Reputation: 111

Gsub to gsubfn, how does it transfer?

I am cleaning my dataset and removing all accents on letters and such. In order to do this I use gsub (see code below). It works perfectly fine but I am sure there is a more convenient way to do it. I've heard about gsubfn but I have not been able to figure out how it works. Any tips on that or on any other way to make this code more efficient?

Ech_final$lastname.y <- gsub(' ', "", Ech_final$lastname.y)
Ech_final$lastname.y <- gsub("'", "", Ech_final$lastname.y)
Ech_final$lastname.y <- gsub("/", "", Ech_final$lastname.y)
Ech_final$lastname.y <- gsub("Ë", "E", Ech_final$lastname.y)
Ech_final$lastname.y <- gsub("É", "E", Ech_final$lastname.y)
Ech_final$lastname.y <- gsub("È", "E", Ech_final$lastname.y)
Ech_final$lastname.y <- gsub("Ç", "C", Ech_final$lastname.y)
Ech_final$lastname.y <- gsub("À", "A", Ech_final$lastname.y)
Ech_final$lastname.y <- gsub("Ù", "U", Ech_final$lastname.y)
Ech_final$lastname.y <- gsub("Œ", "OE", Ech_final$lastname.y)
Ech_final$lastname.y <- gsub("Ï", "I", Ech_final$lastname.y)

Thanks!

Upvotes: 3

Views: 188

Answers (1)

G. Grothendieck
G. Grothendieck

Reputation: 269885

Here are several alternatives. The first uses gsubfn as requested by the Subject line of the question. The next two use iconv or chartr for the substitutions for which the replacement is one character and gsub for the remaining replacements. The last two use gsub for all substitutions using Reduce or a loop.

1) gsubfn gsubfn is used like this where the first argument says to match each character, the second is a translation list and the third is the input.

library(gsubfn)

x <- " '/ËÉÈÇÀÙŒÏ"   # input

L <- list(" " = "", "'" = "", "/" = "", Ë = "E", É = "E", È = "E", 
    Ç = "C", À = "A", Ù = "U", Œ = "OE", Ï = "I")

gsubfn(".", L,  x)
## [1] "EEECAUOEI"

2) gsub/iconv Another approach is a mix of gsub for those cases where the replacement is not one character and iconv. The input x is defined above.

Encoding(x)  # note the encoding
## [1] "latin1"

x |>
  gsub(pattern = "Œ", replacement = "OE") |>
  gsub(pattern = "[ '/]", replacement = "") |>
  iconv("latin1", "ascii//translit")
## [1] "EEECAUOEI"

3) gsub/chartr A third alternative is the same as (2) except we use chartr in place of iconv. The input x is defined above.

x |>
  gsub(pattern = "Œ", replacement = "OE") |>
  gsub(pattern = "[ '/]", replacement = "") |>
  chartr(old = "ËÉÈÇÀÙÏ", new = "EEECAUI")
## [1] "EEECAUOEI"

4) gsub/Reduce This alternative performs the gsub calls shown in the question but does it in a more compact form using Reduce. The list L and the input x are from (1).

Reduce(function(s, nm) gsub(nm, L[[nm]], s), names(L), x)
## [1] "EEECAUOEI"

5) gsub/for One could use a simple for loop to run the gsub calls in the question. The list L and the input x are from (1).

xx <- x
for(nm in names(L)) xx <- gsub(nm, L[[nm]], xx)
xx
## [1] "EEECAUOEI"

Upvotes: 2

Related Questions