Hibai
Hibai

Reputation: 11

Character replacement with gsub not working inside a function

I'm trying to replace some unexpected characters in a data frame in R. According to Replace multiple arguments with gsub, gsub function is supposed to work properly in this cases, so I tried that way.

The values I have in the first column of the data frame are the following:

La Flèche Wallonne
Liège - Bastogne - Liège
Tour de Romandie
Giro d´Italia
Critérium du Dauphiné

And the code's been implemented as follows:

callChangeCharacters <- function(results){
for(i in 1:nrow(results)){
    race <- results[i,1]
    race <- gsub("é","e",race)
    race <- gsub("â","a",race)
    race <- gsub("ó","o",race)
    race <- gsub("ž","z",race)
    race <- gsub("ú","u",race)
    race <- gsub("ø","o",race)
    race <- gsub("Å›","s",race)
    race <- gsub("Å‚","l",race)
    race <- gsub("ä‚","a",race)
    race <- gsub("è","e",race)
    race <- gsub("Ã","a",race)
    race <- gsub("Å","s",race)
    race <- gsub("Ä","c",race)
    race <- gsub("´","'",race)
    results[i,1] <- race
}
return(results)
}

If I run the code which is inside the for loop, I success to get the expected result:

La Fleche Wallonne
Liege - Bastogne - Liege
Tour de Romandie
Giro d'Italia
Criterium du Dauphine

However, if I call the function, the result isn't the same, and the unwanted characters aren't corrected:

> correctedDF <- callChangeCharacters(results)
> correctedDF
                                        V1
La Flèche Wallonne
Liège - Bastogne - Liège
Tour de Romandie
Giro d´Italia
Critérium du Dauphiné

The output of the result I get is the following (this version of results is longer but the problem is the same):

> dput(results)
structure(list(V1 = c("Santos Tour Down Under", "Paris - Nice", 
"Tirreno-Adriatico", "Milano-Sanremo", "Volta Ciclista a Catalunya", 
"E3 Prijs Vlaanderen - Harelbeke", "Gent - Wevelgem", "Ronde van Vlaanderen / Tour des Flandres", 
"Vuelta Ciclista al Pais Vasco", "Paris - Roubaix", "Amstel Gold Race", 
"La Flèche Wallonne", "Liège - Bastogne - Liège", "Tour de Romandie", 
"Giro d´Italia", "Critérium du Dauphiné", "Tour de Suisse", 
"Tour de France", "Tour de Pologne", NA, "Clasica Ciclista San Sebastian", 
"Eneco Tour", "Vuelta a España", "Vattenfall Cyclassics", "GP Ouest France - Plouay", 
"Grand Prix Cycliste de Québec", "Grand Prix Cycliste de Montréal", 
"Il Lombardia", "Tour of Beijing")), .Names = "V1", row.names = c(1L, 
1686L, 4601L, 6743L, 6943L, 9274L, 9473L, 9673L, 9880L, 11581L, 
11779L, 11978L, 12168L, 12367L, 14264L, 21957L, 24734L, 27727L, 
35542L, 37354L, 37470L, 37627L, 39885L, 47277L, 47441L, 47624L, 
47788L, 47952L, 48147L), class = "data.frame")

Any idea of why it doesn't work inside the function?

Thanks in advance.

Upvotes: 1

Views: 1782

Answers (2)

Jonathon
Jonathon

Reputation: 31

I had a similar issue, which occurred because I was using the source function to import my code without specifying that the encoding parameter should be "utf-8".

source("./code.R")

Upon inspecting a function I had read in, I realised that certain special characters had been changed by the source function and hence the function was not working as intended. The solution was to set the encoding parameter to "utf-8".

source("./code.R", encoding="utf-8")

Upvotes: 3

asachet
asachet

Reputation: 6921

Your code works. Also, you should also change ñ (see "Vuelta a España").

The gsub function is vectorized so you don't need the loop at all.

cleanup <- function(race) {
    race <- gsub("é","e",race)
    race <- gsub("â","a",race)
    race <- gsub("ó","o",race)
    race <- gsub("ž","z",race)
    race <- gsub("ú","u",race)
    race <- gsub("ø","o",race)
    race <- gsub("Å›","s",race)
    race <- gsub("Å‚","l",race)
    race <- gsub("ä‚","a",race)
    race <- gsub("è","e",race)
    race <- gsub("Ã","a",race)
    race <- gsub("Å","s",race)
    race <- gsub("Ä","c",race)
    race <- gsub("´","'",race)
    return(race)
}

results$V1 <- cleanup(results$V1)

Why do you use a data.frame if you only have one column? It would be more convenient to just keep a vector race.

If you really want a function which works on results directly, still no loop.

callChangeCharacters <- function(results) {
    results[,1] <- cleanup(results[,1])
    return(results)
}

Upvotes: 0

Related Questions