Reputation: 12205

How to remove #text from my Node parsing in Java dom xml parsing

So I have the following code which I pretty much copied from here. But the problem is my text nodes do not contain any text they just have attributes. So I have like

<Random name="Katie" num="5"></Random>

and I'm using this code to parse it:

  private void listNodes(Node node, String indent)
  {
    String nodeName = node.getNodeName();
    System.out.println(indent + " Node is: " + nodeName);
    
    if(node instanceof Element && node.hasAttributes())
    {
      System.out.println(indent + "Attributes are: ");
      NamedNodeMap attrs = node.getAttributes();
      for (int i = 0; i < attrs.getLength(); i++) 
      {
        Attr attribute = (Attr) attrs.item(i);
        System.out.println(indent + attribute.getName() + "=" + attribute.getValue());
      }
    }
    
    NodeList list = node.getChildNodes(); 
    
    if (list.getLength() > 0) 
    {
      for (int i = 0; i < list.getLength(); i++)
      {
        listNodes(list.item(i), indent + " "); 
      } 
    }
  }

For some reason my empty text nodes all say

Node is: #text

Does anyone know how to skip empty node text when parsing the xml file?

Thanks,

Josh

Upvotes: 9

Answers (3)

FatherMathew

Reputation: 990

You can also use Node.getNodeType() method for this purpose:

Node node;
if (node.getNodeType() == Node.ELEMENT_NODE) {
   // Your code inside this
}

Upvotes: 9

Pankaj

Reputation: 592

'#text' is the result of invoking getNodeName() method on empty node.These empty nodes can be identified using 'XPath' and can be removed.

XPathFactory xpathFactory = XPathFactory.newInstance();
// XPath to find empty text nodes.
XPathExpression xpathExp = xpathFactory.newXPath().compile(
    "//text()[normalize-space(.) = '']");  
NodeList emptyTextNodes = (NodeList) 
    xpathExp.evaluate(doc, XPathConstants.NODESET);
// Remove each empty text node from document.
for (int i = 0; i < emptyTextNodes.getLength(); i++) {
  Node emptyTextNode = emptyTextNodes.item(i);
emptyTextNode.getParentNode().removeChild(emptyTextNode);
}

'#text' is generated as a result of empty spaces.

Upvotes: 5

robert_x44

Reputation: 9314

With DTD validation you can have the parser automatically suppress the whitespace between elements. However to modify your specific implementation, you can test for Text nodes, and ignore them if they are empty.

private void listNodes(Node node, String indent)
{
    if (node instanceof Text) {
        String value = node.getNodeValue().trim();
        if (value.equals("") ) {
            return;
        }
    }

    String nodeName = node.getNodeName();
    System.out.println(indent + " Node is: " + nodeName);
    ...

Upvotes: 11

How to remove #text from my Node parsing in Java dom xml parsing

Answers (3)

Related Questions