Reputation: 437
I have an XML file which is structured like that:
<?xml version="1.0" encoding="UTF-8"?>
<entry id="young_1">
<sense n="1">
<cit type="translation" lang="fr">
<quote>jeune</quote>
</cit>
<re type="phr">
<sense>
<cit type="translation" lang="fr">
<quote>un jeune homme</quote>
</cit>
</sense>
</re>
</sense>
<sense n="2">
<cit type="translation" lang="fr">
<quote>petits
<hi rend="i">mpl</hi>
</quote>
</sense>
</entry>
I need to parse it using JAVA to obtain each quote
value contained in a cit
element with the attribute type="translation"
:
quote
element but I don't need to have the text content of the immediate node such as <quote>petits <hi rend="i">mpl</hi></quote>
quote
element contained in an re
elementFinally I need to obtain this result:
entry ==> young_1
translations ==> [jeune;petits]
For now my JAVA code is:
//load xml document for DOM parsing
Document doc = loadXMLFromString(xmlContent);
//now try to parse it
NodeList nList = doc.getElementsByTagName("sense");
for (int i = 0; i < nList.getLength(); i++) {
Node nNode = nList.item(i);
if (nNode.getNodeType() == Node.ELEMENT_NODE) {
Element eElement = (Element) nNode;
NodeList fieldNodes = eElement.getElementsByTagName("cit");
for(int j = 0; j < fieldNodes.getLength(); j++) {
Node fieldNode = fieldNodes.item(j);
NamedNodeMap attributes = fieldNode.getAttributes();
Node attr = attributes.getNamedItem("type");
if(attr != null) {
if(attr.getTextContent().equals("translation")) {
//how can I access <quote> element ???
}
}
}
}
}
But I don't know how can I access the <quote>
...
Upvotes: 1
Views: 1781
Reputation: 72854
You can access the <quote>
element exactly the same way you're accessing the <cit>
elements: by using the Element#getElementsByTagName(String name)
method:
Node attr = attributes.getNamedItem("type");
if (attr != null) {
if (attr.getTextContent().equals("translation")) {
Element citElement = (Element) fieldNode;
NodeList quoteNodeList = citElement.getElementsByTagName("quote");
if(quoteNodeList.getLength() > 0) {
Node quoteNode = quoteNodeList.item(0);
String quote = quoteNode.getTextContent();
...
}
}
}
In order to exclude nodes contained in a <re>
node, you can check the parent of the <sense>
node using nNode.getParentNode().getNodeName()
, e.g.:
if (!nNode.getParentNode().getNodeName().equals("re")) {
....
}
Upvotes: 1