Reputation: 180
I am reading in XML files into R that have varied formats. I read them in using UTF-8 encoding. I'm having trouble replacing non-ascii negative signs which look like "−". I can't simply expunge non-ascii characters because I want to keep the negative sign. So the gsub below does not work and I've tried lots of different options for the pattern.
in_text = "<td align=\"left\" rowspan=\"1\" colspan=\"1\">−0.68 (1.04)</td>"
gsub(pattern='−', replacement='-', in_text)
<td align=\"left\" rowspan=\"1\" colspan=\"1\">−0.68 (1.04)</td>
I can see they are non-ASCII:
tools::showNonASCII(in_text)
<td align="left" rowspan="1" colspan="1"><e2><88><92>0.68 (1.04)</td>
Upvotes: 2
Views: 815
Reputation: 206253
It doesn't appear that what you've posted in the question actually has the non-ascii character. I think your source matches this
in_text = "<td align=\"left\" rowspan=\"1\" colspan=\"1\">\u22120.68 (1.04)</td>"
in_text
# [1] "<td align=\"left\" rowspan=\"1\" colspan=\"1\">−0.68 (1.04)</td>"
The character "\u2212"
seems to match the output you get from tools::showNonASCII
. So if you use that escaped character in the replace it should work fine
gsub(pattern='\u2212', replacement='-', in_text)
# [1] "<td align=\"left\" rowspan=\"1\" colspan=\"1\">-0.68 (1.04)</td>"
Upvotes: 2