Andrej
Andrej

Reputation: 3839

XPath to select list of nodes in Java

I have the following XML file:

<RecordSet>
  <Record>
    <ID>001</ID>
    <TermList>
      <Term>Term1</Term>
      <Term>Term2</Term>
      <Term>Term3</Term>
    </TermList>
  </Record>
  <Record>
    <ID>002</ID>
    <TermList>
      <Term>Term3</Term>
      <Term>Term4</Term>
      <Term>Term5</Term>
    </TermList>
  </Record>
</RecordSet>

and need to parse it into a "ID-Term" file, i.e.,

001 Term1
001 Term2
001 Term3
002 Term3
002 Term4
002 Term5

Currently I have the following application:

import java.io.IOException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

import javax.xml.parsers.*;
import javax.xml.xpath.*;

import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;

public class MedlineParser {

    public static void main(String[] args) {
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        factory.setNamespaceAware(true);
        DocumentBuilder builder;
        Document doc = null;
        try {
            builder = factory.newDocumentBuilder();
            doc = builder.parse("/home/andrej/Documents/test.xml");
            // Create XPathFactory object
            XPathFactory xpathFactory = XPathFactory.newInstance();
            // Create XPath object
            XPath xpath = xpathFactory.newXPath();
            try {
                XPathExpression expr1 = xpath.compile("/RecordSet/Record/ID/text()");
                NodeList nodes1 = (NodeList) expr1.evaluate(doc, XPathConstants.NODESET);
                for (int i = 0; i < nodes1.getLength(); i++) {
                    String id = nodes1.item(i).getNodeValue();
                    XPathExpression expr2 = xpath.compile("/RecordSet/Record/TermList/Term/text()");
                    NodeList nodes2 = (NodeList) expr2.evaluate(doc, XPathConstants.NODESET);
                    for (int j = 0; j < nodes2.getLength(); j++) {
                        System.out.println(id + " " + nodes2.item(i).getNodeValue());
                    }
                }
            } catch (XPathExpressionException e) {
                e.printStackTrace();
            }

        } catch (IOException | ParserConfigurationException | SAXException e) {
            e.printStackTrace();
        }
    }
}

Unfortunately, the program output is currently:

001 Term1
001 Term1
001 Term1
001 Term1
001 Term1
001 Term1
002 Term2
002 Term2
002 Term2
002 Term2
002 Term2
002 Term2

Any idea what's wrong with XPath expressions?

Upvotes: 2

Views: 4301

Answers (2)

wero
wero

Reputation: 32980

Seems that you are printing the cartesian product of all ids and terms.

This would be easier:

  1. Select and loop over all Record-nodes with the XPath expression /RecordSet/Record.
  2. For each record node, select the id (with XPath ID) and the terms (with XPath Termlist/Term), using the Record-node as context node.

Upvotes: 1

M A
M A

Reputation: 72844

Two issues:

  1. The XPath must take into the account the index of the ID node being iterated in the first loop. Your current XPath gets all Term nodes every time for each ID node. You should change it to something like:

    XPathExpression expr2 = xpath.compile("/RecordSet/Record[" + (i + 1) + "]/TermList/Term/text()");
    
  2. You have a typo in the inner for loop. You should use j instead of i:

    for (int j = 0; j < nodes2.getLength(); j++) {
        System.out.println(id + " " + nodes2.item(j).getNodeValue());
    }
    

Upvotes: 1

Related Questions