Joseph
Joseph

Reputation: 169

xml parsing Read an xml tag as text content

I have this sample of xml file:

<Cells>

          <Cell row="1" column="1">p</Cell>     
<Cells>

where is p is the content of the cell.. but sometimes i need to put inside the content xml tags and i want to read them as simples text and not as xml tag ... something like that:

 <Cells>
    <Cell row="1" column="1">p</Cell>  
    <Cell row="2" column="2"><Cell></Cell>  
    <Cell row="3" column="3"></Cell></Cell>   
 <Cells>

How can i do? To read this xml i use something like that:

 doc.getDocumentElement().normalize();


            NodeList nList = doc.getElementsByTagName("Cell");

            cell = new String[nList.getLength()][4];

            for (int temp = 0; temp < nList.getLength(); temp++) {

                Node nNode = nList.item(temp);

                if (nNode.getNodeType() == Node.ELEMENT_NODE) {

                    Element eElement = (Element) nNode;
                        cell[temp][1] = eElement.getAttribute("row");
                        cell[temp][2] = eElement.getAttribute("column");
                        cell[temp][3] = eElement.getTextContent();

                }
            }

So there is any way to read < Cell > or < /Cell > inside of a < Cell > .... < /Cell > as a content and not as xml tag?

Thank you!

Upvotes: 0

Views: 934

Answers (1)

Jim Garrison
Jim Garrison

Reputation: 86774

When using a Java XML parser it is required that the input be well-formed XML. This is because the in-memory document consists of nodes and attributes (and a few other things) and NOT tags in any way that resembles the input text stream.

The text stream is the serialized version of an abstract "thing" known as an XML document. Once it has been parsed into a DOM the details of how it looked in the serialized file are gone and all that remains is the semantic structure and content. There are no "tags" (start or end), they are artifacts of the serialization and not the semantic content.

If you need to treat a sub-tree in its serialized version you could write a custom SAX (event driven) parser to handle the tag events and maintain the serialized text, but that would be rather complex. You might also be able to re-serialize the subtree at the point where you need it in serialized form. This would also be "interesting".

If you need to process XML that is not well-formed (i.e. missing end tags or with other syntax errors) you cannot use a standard parser at all. It will fail to parse the document and throw an exception.

In short, what you are trying to do is outside the scope of Java-based XML parsers, and there are no good answers that do not involve lots of work.

Upvotes: 1

Related Questions