Reputation: 3839
I have the following XML file:
<RecordSet>
<Record>
<ID>001</ID>
<TermList>
<Term>Term1</Term>
<Term>Term2</Term>
<Term>Term3</Term>
</TermList>
</Record>
<Record>
<ID>002</ID>
<TermList>
<Term>Term3</Term>
<Term>Term4</Term>
<Term>Term5</Term>
</TermList>
</Record>
</RecordSet>
and need to parse it into a "ID-Term" file, i.e.,
001 Term1
001 Term2
001 Term3
002 Term3
002 Term4
002 Term5
Currently I have the following application:
import java.io.IOException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import javax.xml.parsers.*;
import javax.xml.xpath.*;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
public class MedlineParser {
public static void main(String[] args) {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
DocumentBuilder builder;
Document doc = null;
try {
builder = factory.newDocumentBuilder();
doc = builder.parse("/home/andrej/Documents/test.xml");
// Create XPathFactory object
XPathFactory xpathFactory = XPathFactory.newInstance();
// Create XPath object
XPath xpath = xpathFactory.newXPath();
try {
XPathExpression expr1 = xpath.compile("/RecordSet/Record/ID/text()");
NodeList nodes1 = (NodeList) expr1.evaluate(doc, XPathConstants.NODESET);
for (int i = 0; i < nodes1.getLength(); i++) {
String id = nodes1.item(i).getNodeValue();
XPathExpression expr2 = xpath.compile("/RecordSet/Record/TermList/Term/text()");
NodeList nodes2 = (NodeList) expr2.evaluate(doc, XPathConstants.NODESET);
for (int j = 0; j < nodes2.getLength(); j++) {
System.out.println(id + " " + nodes2.item(i).getNodeValue());
}
}
} catch (XPathExpressionException e) {
e.printStackTrace();
}
} catch (IOException | ParserConfigurationException | SAXException e) {
e.printStackTrace();
}
}
}
Unfortunately, the program output is currently:
001 Term1
001 Term1
001 Term1
001 Term1
001 Term1
001 Term1
002 Term2
002 Term2
002 Term2
002 Term2
002 Term2
002 Term2
Any idea what's wrong with XPath expressions?
Upvotes: 2
Views: 4301
Reputation: 32980
Seems that you are printing the cartesian product of all ids and terms.
This would be easier:
/RecordSet/Record
.ID
) and the terms (with XPath Termlist/Term
), using the Record-node as context node. Upvotes: 1
Reputation: 72844
Two issues:
The XPath must take into the account the index of the ID
node being iterated in the first loop. Your current XPath gets all Term
nodes every time for each ID
node. You should change it to something like:
XPathExpression expr2 = xpath.compile("/RecordSet/Record[" + (i + 1) + "]/TermList/Term/text()");
You have a typo in the inner for
loop. You should use j
instead of i
:
for (int j = 0; j < nodes2.getLength(); j++) {
System.out.println(id + " " + nodes2.item(j).getNodeValue());
}
Upvotes: 1