Reputation: 4953
I'm using Web-Harvest to scrap a website and generate xml file with data.
I'm having ugly nodes like <name> </name>
, using normalize-space() didn't help so I opened the file in Hex view, and I found it corresponds to 'c2a0'. I looked arround for a solution, but no one helped...
To sum up, what I want is to remove that weird space (using xquery or xpath1/2), so I can get an empty node <name/>
ps: the used encoding is 'iso-8859-1'
Upvotes: 0
Views: 1096
Reputation: 16917
You can use translate
to remove certain characters. And utf8 c2a0 is the character U+00A0, hexadecimal 0xA0 is 160, so you can use codepoints-to-string(160)
to get a string with the space.
Together:
translate(your node text, codepoints-to-string(160), "")
Upvotes: 1