D. Studer
D. Studer

Reputation: 1875

Replace wrong encodings in a data-frame

How can I replace all occurences of a certain strings (e.g. "Ü) by it's corresponding special-character? (unfortunately the character-encodings are wrong).

For example I'd like to replace "ü" by "ü", "ä" by "ä" etc. I can do this with the following code, but how can I apply this to every column in the data.frame? And how can I do this most efficiently?

df$colum<-gsub("ü", "ü", daf$column)

Thank you!

Upvotes: 0

Views: 435

Answers (1)

hdkrgr
hdkrgr

Reputation: 1736

There might be several ways to go about this depending on what the actual problem is:

A:

If your original data (e.g. csv-file etc) looks fine and you only see the bad encoding in R, you should try to read the file with the correct encoding - most reader and writer functions take a parameter for this and UTF-8 should work in most cases. You could, for example try read.csv(your_file_path, fileEncoding='UTF-8') or similar (depending on how you read your data.)

B:

The data is actually broken (i.e. someone has messed up the encoding previously and it's not your fault for reading it wrong) and you now want to fix it manually (for just a couple of characters, e.g. ä, ö, ü, ß.)

Then, using the dplyr package you could:

  1. make a function that fixes the errors:

    my_fun <- function(str){ str <- gsub("ü", "ü", str) str <- gsub("ä", "ä", str) < additional steps > str }

  2. Apply it to every character-column of your data frame:

df %>% mutate_if(is.character, my_function)

Upvotes: 1

Related Questions