Reputation: 2323
I'm using a DocumentBuilder to parse XML files. However, the specification for the project requires that within text nodes, strings like "
and <
be returned literally, and not decoded as characters ("
and <
).
A previous similar question, Read escaped quote as escaped quote from xml, received one answer that seems to be specific to Apache, and another that appears to simply not not do what it says it does. I'd love to be proven wrong on either count, however :)
For reference, here is some code:
file = new File(fileName);
DocBderFac = DocumentBuilderFactory.newInstance();
DocBder = DocBderFac.newDocumentBuilder();
doc = DocBder.parse(file);
NodeList textElmntLst = doc.getElementsByTagName(text);
Element textElmnt = (Element) textElmntLst.item(0);
NodeList txts = textElmnt.getChildNodes();
String txt = ((Node) txts.item(0)).getNodeValue();
System.out.println(txt);
I would like that println() to produce things like
"3>2"
instead of
"3>2"
which is what currently happens. Thanks!
Upvotes: 4
Views: 4554
Reputation: 41135
I'm using a DocumentBuilder to parse XML files. However, the specification for the project requires that within text nodes, strings like
"
and<
be returned literally, and not decoded as characters (" and <).
Bad requirement. Don't do that.
Or at least consider carefully why you think you want or need it.
CDATA sections and escapes are a tactic for allowing you to pass text like quotes and '<' characters through XML and not have XML confuse them with markup. They have no meaning in themselves and when you pull them out of the XML, you should accept them as the quotes and '<' characters they were intended to represent.
Upvotes: 2
Reputation: 2323
Both good answers, but both a little too heavy-weight for this very small-scale application. I ended up going with the total hack of just stripping out all &s (I do this to &s that aren't part of escapes later anyway). It's ugly, but it's working.
Edit: I understand there's all kinds of things wrong with this, and that the requirement is stupid. It's for a school project, all that matters is that it work in one case, and the requirement is not my fault :)
Upvotes: -3
Reputation: 597342
You can turn them back into xml-encoded form by
StringEscapeUtils.escapeXml(str);
Upvotes: 3
Reputation: 6795
One approach might be to try dom4j, and to use the Node.asXML() method. It might return a deep structure, so it might need cloning to get just the node or text you want without any of its children.
Upvotes: 1