Reputation: 343
Im trying to parse an XML file in Java and some lines contains an HTML symbol & #153; Still, when I do
((String) myXPath.evaluate(node, STRING));
I get square symbol instead of . My machine is Linux and XML encoding is UTF-8. I can't understand how to properly encode this exact symbol. & #8482; is encoded perfectly well...
I create a Document instance in a following way:
File xmlFile = new File(path);
FileInputStream fileIS = new FileInputStream(xmlFile);
xmlDocument = builder.parse(fileIS);
Upvotes: 1
Views: 2711
Reputation: 163655
The HTML entity & # 153 represents the character with Unicode codepoint 153, which is some unprintable control character. It isn't a trademark symbol. 153 might be a trademark symbol in some Microsoft Windows character set, but that's irrelevant on the web. You need to use the Unicode codepoint which is 8482 - https://en.wikipedia.org/wiki/Trademark_symbol
Note that the numbers used in HTML entity references have nothing to do with the file encoding. In fact, that's the whole point of using them - they survive changes of encoding.
Upvotes: 1