phil
phil

Reputation: 191

Can't remove unicode characters from strings using gsub

I've reviewed many other stackoverflow questions/answers about how to remove unicode characters from strings, but none of the them seem to be working for me!

Exact problem reproduction:

event = as.data.frame(read_html("https://www.bestfightodds.com/events/ufc-226-miocic-vs-cormier-1447") %>% html_table(fill=T))
event$X5Dimes

As you can see, there are embedded up and down arrows. I'd like to remove them so that only the line remains. For example

"-310<U+25BC>" would become "-310"

I've tried many gsub patterns to remove them -- of my own creation and from other stack overflow answers -- and nothing is working! Some example patterns are below

event$X5Dimes = gsub("<.+>", "", event$X5Dimes)
event$X5Dimes = gsub("\\S+\\s+|-", "", event$X5Dimes)
event$X5Dimes = gsub("^\\s*<U\\+\\w+>\\s*", "", event$X5Dimes)
event$X5Dimes = gsub("\\<U[^\\>]*\\>", "", event$X5Dimes)  

Can anyone help? Much appreciated -- losing my mind! Thanks!

Upvotes: 1

Views: 313

Answers (1)

Ibrahim
Ibrahim

Reputation: 6088

Try to do it simply this way:

event$X5Dimes = gsub("▼|▲", "", event$X5Dimes)

Upvotes: 1

Related Questions