parag mittal
parag mittal

Reputation: 233

Skip deserialization of an element and get whole content as a string while xml parsing in java

I have a XML like below:

<content>
  <p><b>Node:</b> Some information</p>
</content>

When deserializing this XML, I want to get the content inside p tag as a string.

For example, if I have a java class like below:

@Data
class Content {
  TextInParagraph p;
}

@Data
class TextInParagraph {
  String text;
}

I should have value of text as "<b>Node:</b> Some information".

Is there a way I can do above using JAXB or Jackson XML parser?

I tried deserializing above in Jackson, but I am getting below exception:

Expected END_ELEMENT, got event of type 1
java.io.IOException: Expected END_ELEMENT, got event of type 1

Upvotes: 2

Views: 876

Answers (1)

thunderhook
thunderhook

Reputation: 580

Sadly, this is not possible with jackson-dataformat-xml.

With JAXB however you can solve this by using a DomHandler

@XmlRootElement(name = "content")
@XmlAccessorType(XmlAccessType.FIELD)
public class Content {

    @XmlAnyElement(InnerXmlHandler.class)
    private String p;
}

DomHandler

import javax.xml.bind.ValidationEventHandler;
import javax.xml.bind.annotation.DomHandler;
import javax.xml.transform.Source;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
import java.io.StringReader;
import java.io.StringWriter;

public class InnerXmlHandler implements DomHandler<String, StreamResult> {

    private static final String START_TAG = "<p>";
    private static final String END_TAG = "</p>";

    private StringWriter xmlWriter = new StringWriter();

    public StreamResult createUnmarshaller(ValidationEventHandler errorHandler) {
        return new StreamResult(xmlWriter);
    }

    public String getElement(StreamResult rt) {
        String xml = rt.getWriter().toString();
        int beginIndex = xml.indexOf(START_TAG) + START_TAG.length();
        int endIndex = xml.lastIndexOf(END_TAG);
        return xml.substring(beginIndex, endIndex);
    }

    public Source marshal(String n, ValidationEventHandler errorHandler) {
        try {
            String xml = START_TAG + n.trim() + END_TAG;
            StringReader xmlReader = new StringReader(xml);
            return new StreamSource(xmlReader);
        } catch (Exception e) {
            throw new RuntimeException(e);
        }
    }
}

This works with the sample you provided, but even works with nested <p> tags like:

<content>
  <p> This is some <ul><li>list</li></ul> and <p>nested paragraph</p></p>
</content>

However, this works only when the inner HTML/XML is valid. The following will not work and throw an exception like The element type "ul" must be terminated by the matching end-tag "</ul>".

<content>
  <p> This is some <ul>invalid xml </p>
</content>

This is because of JAXBs internals which traverses all inner elements although the dom handler is provided.

Upvotes: 2

Related Questions