cartman
cartman

Reputation: 203

xml parse string match Java

I'm trying to parse the bunch of xml files from a folder and return all the tags that contain particular expression. Below is what I did,

public class MyDomParser {

    public static void main(String[] args) {
           DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
            try {
                File folder = new File("C:\\Users\\xmlfolder");

                DocumentBuilder builder = factory.newDocumentBuilder();
                for(File workfile : folder.listFiles()){
                    if(workfile.isFile()){
                        Document doc = builder.parse(workfile);

                        }
                    }
                }


            } catch (ParserConfigurationException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            } catch (SAXException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            } catch (IOException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            }

    }

}

How do I loop through all the tags in each XML and return the tags that contain the expression "/server[^<]*".

Any help is much appreciated.

Upvotes: 1

Views: 436

Answers (2)

Michael Kay
Michael Kay

Reputation: 163262

This is a job for XQuery. It's a one-liner:

collection('file://my-folder/?recurse=yes;select=*.xml')//*[.='/server[^<]*'])

The syntax of collection URIs may vary from one XQuery implementation to another; the above works with Saxon.

Parsing each of the files using DOM and then navigating them using DOM interfaces is just absurdly inefficient both in terms of your time and in terms of machine performance.

You can of course invoke XQuery from Java, and get the results back in a form that Java can manipulate.

Upvotes: 0

Michael Markidis
Michael Markidis

Reputation: 4191

You could create a separate method that recursively goes through all nodes in the current XML file and adds the matched tags to a List of Nodes.

Example:

public static void parseTags (Node node, List<Node> list)
{
      NodeList nodeList = node.getChildNodes();
      for (int i = 0; i < nodeList.getLength(); i++)
      {
           Node n = nodeList.item(i);
           if (n.getNodeType() == Node.ELEMENT_NODE)
           {
               String content = n.getTextContent();

               // if the tag content matches your criteria, add it to the list
               if (content.matches("/server[^<]*"))
               {
                   list.add(n);
               }
               parseTags(n, list);
           }
      }
}

You can call this method in your existing code like this:

// create your list outside the loop like this:
List<Node> list = new ArrayList<Node>();

for(File workfile : folder.listFiles())
{
    if(workfile.isFile())
    {
        Document doc = builder.parse(workfile);

        // call the recursive method here:
        parseTags(doc.getDocumentElement(), list);
    }
}

Upvotes: 1

Related Questions