Reputation: 11
i have a data frame that i need to visualize in R Studio, and for two columns of info (that show the names of points of origin and destination) some of the characters are like this:
St.P<U+00BA>lten
This happens for different words like
W<U+00BA>rgl, V<U+00BA>cklabruck
this happens only in those two columns of the data frame,
How do i remove those letters, or replace them? i feel like it needs to search those particular strings
<U+00BA>
in the two columns and replace them whenever it finds them, do u guys know some code that will help me achieve this?
Thanks!
Upvotes: 1
Views: 1704
Reputation: 263342
I'm guessing that you are showing us the display with some other program than R. If you look at ?Syntax
at the R console you will see that Unicode characters are quoted after an escaped-u, e.g. "\u00BA"
. That character isn't really an umlauted-lowercase-o, but perhaps the authors of that data source are using a different character set. So you could match that oddball spelling of Vöcklabruck"
with this regex logical test
grepl( "V\\u00BAcklabruck" , R_reference_to_your_column)
That should be TRUE
for all of the examples you mentioned.
A "real" lowercase umlaut-o in your source's notation is and in R's notation `"\u00E4", so I suspect you actually want to do this:
your_dfrm$yourcol <- gsub( "\\u00BA", "\u00E4", your_dfrm$yourcol)
Most systems these days are set up to display "umlauted characters", i.e. ones with a vowel that have a diaresis.
Upvotes: 1
Reputation: 60070
If I treat the text in your question as the literal content of your strings, I can turn them back into Unicode characters using:
library(stringr)
x = c("W<U+00BA>rgl", "V<U+00BA>cklabruck")
unicode_chars = str_match(x, "<U\\+([a-zA-Z0-9]+)>")
str_replace(x, "<U\\+[a-zA-Z0-9]+>", paste0("\\u", unicode_chars[, 2]))
# Output:
[1] "Wºrgl" "Vºcklabruck"
But maybe your strings are already stored as Unicode and it's a problem with how your system displays them, in which case this won't help.
Upvotes: 0