Reputation: 14439

get node raw text

How get node value with its children nodes? For example I have following node parsed into dom Document instance:

<root>
    <ch1>That is a text with <value name="val1">value contents</value></ch1>
</root>

I select ch1 node using xpath. Now I need to get its contents, everything what is containing between <ch1> and </ch1>, e.g. That is a text with <value name="val1">value contents</value>.

How can I do it?

Upvotes: 2

Answers (4)

Lukas Eder

Reputation: 220887

You could use jOOX to wrap your DOM objects and get many utility functions from it, such as the one you need. In your case, this will produce the result you need (using css-style selectors to find <ch1/>:

String xml = $(document).find("ch1").content();

Or with XPath as you did:

String xml = $(document).xpath("//ch1").content();

Internally, jOOX will use a transformer to generate that output, as others have mentioned

Upvotes: 1

user467257

Reputation: 1742

If this is server side java (ie you do not need to worry about it running on other jvm's) and you are using the Sun/Oracle JDK, you can do the following:

import com.sun.org.apache.xml.internal.serialize.OutputFormat;
import com.sun.org.apache.xml.internal.serialize.XMLSerializer;

...

Node n = ...;
OutputFormat outputFormat = new OutputFormat();
outputFormat.setOmitXMLDeclaration(true);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
XMLSerializer ser = new XMLSerializer(baos, outputFormat);
ser.serialize(n);
System.out.println(new String(baos.toByteArray()));

Remember to ensure your ultimate conversion to string may need to take an encoding parameter if the parsed xml dom has its text nodes in a different encoding than your platforms default one or you'll get garbage on the unusual characters.

Upvotes: 1

michael nesterenko

Reputation: 14439

I have found the following code snippet that uses transformation, it gives almost exactly what I want. It is possible to tune result by changing output method.

public static String serializeDoc(Node doc) {
        StringWriter outText = new StringWriter();
        StreamResult sr = new StreamResult(outText);
        Properties oprops = new Properties();
        oprops.put(OutputKeys.METHOD, "xml");
        TransformerFactory tf = TransformerFactory.newInstance();
        Transformer t = null;
        try {
            t = tf.newTransformer();
            t.setOutputProperties(oprops);
            t.transform(new DOMSource(doc), sr);
        } catch (Exception e) {
            System.out.println(e);
        }
        return outText.toString();
    }

Upvotes: 2

Russell Zahniser

Reputation: 16364

As far as I know, there is no equivalent of innerHTML in Document. DOM is meant to hide the details of the markup from you.

You can probably get the effect you want by going through the children of that node. Suppose for example that you want to copy out the text, but replace each "value" tag with a programmatically supplied value:

HashMap<String, String> values = ...;
StringBuilder str = new StringBuilder();
for(Element child = ch1.getFirstChild; child != null; child = child.getNextSibling()) {
    if(child.getNodeType() == Node.TEXT_NODE) {
        str.append(child.getTextContent());
    } else if(child.getNodeName().equals("value")) {
        str.append(values.get(child.getAttributes().getNamedItem("name").getTextContent()));
    }
}
String output = str.toString();

Upvotes: 0

get node raw text

Answers (4)

Related Questions