Reputation: 11

Need to replace characters in a data frame

i have a data frame that i need to visualize in R Studio, and for two columns of info (that show the names of points of origin and destination) some of the characters are like this:

St.P<U+00BA>lten

This happens for different words like

W<U+00BA>rgl, V<U+00BA>cklabruck

this happens only in those two columns of the data frame,

How do i remove those letters, or replace them? i feel like it needs to search those particular strings

<U+00BA>

in the two columns and replace them whenever it finds them, do u guys know some code that will help me achieve this?

Thanks!

Upvotes: 1

Answers (2)

IRTFM

Reputation: 263342

I'm guessing that you are showing us the display with some other program than R. If you look at ?Syntax at the R console you will see that Unicode characters are quoted after an escaped-u, e.g. "\u00BA". That character isn't really an umlauted-lowercase-o, but perhaps the authors of that data source are using a different character set. So you could match that oddball spelling of Vöcklabruck" with this regex logical test

 grepl( "V\\u00BAcklabruck" , R_reference_to_your_column)

That should be TRUE for all of the examples you mentioned.

A "real" lowercase umlaut-o in your source's notation is and in R's notation `"\u00E4", so I suspect you actually want to do this:

  your_dfrm$yourcol <- gsub( "\\u00BA", "\u00E4", your_dfrm$yourcol)

Most systems these days are set up to display "umlauted characters", i.e. ones with a vowel that have a diaresis.

Upvotes: 1

Marius

Reputation: 60070

If I treat the text in your question as the literal content of your strings, I can turn them back into Unicode characters using:

library(stringr)
x = c("W<U+00BA>rgl", "V<U+00BA>cklabruck")
unicode_chars = str_match(x, "<U\\+([a-zA-Z0-9]+)>")
str_replace(x, "<U\\+[a-zA-Z0-9]+>", paste0("\\u", unicode_chars[, 2]))
# Output:
[1] "Wºrgl"       "Vºcklabruck"

But maybe your strings are already stored as Unicode and it's a problem with how your system displays them, in which case this won't help.

Upvotes: 0

Need to replace characters in a data frame

Answers (2)

Related Questions