Reputation: 39
I have a table with many strings that contain some weird characters that I'd like to replace with the "original" ones. Ä became ä, ö became ö, so I replace each ö with an ö in the text. It works, however, ß became à < U+009F> and I am unable to replace it...
# Works just fine:
gsub('ö', 'REPLACED', "Testing string ö")
# this does not work
gsub("Ã<U+009F>", "REPLACED", "Testing string Ã<U+009F> ")
# this does not work as well...
gsub("â<U+0080><U+0093>", "REPLACED", "Testing string â<U+0080><U+0093> ")
How do I tell R to replace These parts with some letter I want to insert?
Upvotes: 0
Views: 676
Reputation: 887118
As there are metacharacters (+
- to signify one or more), in order to evaluate it literally either escape (as @boski mentioned in the solution) or use fixed = TRUE
sub("Ã<U+009F>", "REPLACED", "Testing string Ã<U+009F> ", fixed = TRUE)
#[1] "Testing string REPLACED "
Upvotes: 1
Reputation: 2467
You have to escape the +
symbol, as it is a regex
command.
> gsub("Ã<U\\+009F>", "REPLACED", "Testing string Ã<U+009F> ")
[1] "Testing string REPLACED "
> gsub("â<U\\+0080><U\\+0093>", "REPLACED", "Testing string â<U+0080><U+0093> ")
[1] "Testing string REPLACED "
Upvotes: 0