Fernando Moyano
Fernando Moyano

Reputation: 1107

How to unformat xml file

I have a method which returns a String with a formatted xml. The method reads the xml from a file on the server and parses it into the string:

Esentially what the method currently does is:

  private ServletConfig config;
  InputStream xmlIn = null ;
  xmlIn = config.getServletContext().getResourceAsStream(filename + ".xml") ; 
  String xml = IOUtils.toString(xmlIn);
  IOUtils.closeQuietly(xmlIn);
  return xml;  

What I need to do is add a new input argument, and based on that value, continue returning the formatted xml, or return unformatted xml.

What I mean with formatted xml is something like:

<xml>
  <root>
    <elements>
       <elem1/>
       <elem2/>
    <elements>
  <root>
</xml>

And what I mean with unformatted xml is something like:

<xml><root><elements><elem1/><elem2/><elements><root></xml>

or:

<xml>
<root>
<elements>
<elem1/>
<elem2/>
<elements>
<root>
</xml>

Is there a simple way to do this?

Upvotes: 1

Views: 9827

Answers (7)

Oz Shabat
Oz Shabat

Reputation: 1622

Kotlin.

An indentation will usually come after new line and formatted as one space or more. Hence, to make everything in the same column, we will replace all of the new lines, following one or more spaces:

xmlTag = xmlTag.replace("(\n +)".toRegex(), " ")

Upvotes: 0

Alpedar
Alpedar

Reputation: 1344

You can: 1) remove all consecutive whitespaces (but not single whitespace) and then replace all >(whitespace)< by >< applicable only if usefull content does not have multiple consecutive significant whitespaces 2) read it in some dom tree and serialize it using some nonpretty serialization

    SAXReader reader = new SAXReader();
    Reader r = new StringReader(data);
    Document document = reader.read(r);
    OutputFormat format = OutputFormat.createCompactFormat();
    StringWriter sw = new StringWriter();
    XMLWriter writer = new XMLWriter(sw, format);
    writer.write(document);
    String string = writer.toString();

3) use Canonicalization (but you must somehow explain to it that those whitespaces you want to remove are insignificant)

Upvotes: 0

Chris Dennett
Chris Dennett

Reputation: 22741

Try something like the following:

TransformerFactory factory = TransformerFactory.newInstance();
Transformer transformer = factory.newTransformer(
    new StreamSource(new StringReader(
        "<xsl:stylesheet version=\"1.0\"" +
        "   xmlns:xsl=\"http://www.w3.org/1999/XSL/Transform\">" + 
        "<xsl:output method=\"xml\" omit-xml-declaration=\"yes\"/>" +
        "  <xsl:strip-space elements=\"*\"/>" + 
        "  <xsl:template match=\"@*|node()\">" +
        "   <xsl:copy>" +
        "    <xsl:apply-templates select=\"@*|node()\"/>" +
        "   </xsl:copy>" +
        "  </xsl:template>" +
        "</xsl:stylesheet>"
    ))
);
Source source = new StreamSource(new StringReader("xml string here"));
StreamResult result = new StreamResult(System.out);
transformer.transform(source, result);

Instead of source being StreamSource in the second instance, it can also be DOMSource if you have an in-memory Document, if you want to modify the DOM before saving.

DOMSource source = new DOMSource(document);

To read an XML file into a Document object:

File file = new File("c:\\MyXMLFile.xml");
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(file);
doc.getDocumentElement().normalize();

Enjoy :)

Upvotes: 1

tobiasbayer
tobiasbayer

Reputation: 10389

Strip all newline characters with String xml = IOUtils.toString(xmlIn).replace("\n", ""). Or \t to keep several lines but without indentation.

Upvotes: 2

Newtopian
Newtopian

Reputation: 7732

an empty transformer with a parameter setting the indent params like so

public static String getStringFromDocument(Document dom, boolean indented) {
    String signedContent = null;        
    try {
            StringWriter sw = new StringWriter();
            DOMSource domSource = new DOMSource(dom);
            TransformerFactory tf = new TransformerFactoryImpl();
            Transformer trans = tf.newTransformer();
            trans = tf.newTransformer();
            trans.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
            trans.setOutputProperty(OutputKeys.INDENT, indented ? "yes" : "no");

            trans.transform(domSource, new StreamResult(sw));
            sw.flush();
            signedContent = sw.toString();

        } catch (TransformerException e) {
            e.printStackTrace();
        }
        return signedContent;
    }

works for me.

the key lies in this line

 trans.setOutputProperty(OutputKeys.INDENT, indented ? "yes" : "no");

Upvotes: 1

Kent
Kent

Reputation: 195229

if you are sure that the formatted xml like:

<xml>
  <root>
    <elements>
       <elem1/>
       <elem2/>
    <elements>
  <root>
</xml>

you can replace all group 1 in ^(\s*)< to "". in this way, the text in xml won't be changed.

Upvotes: 2

Edd
Edd

Reputation: 8600

If you fancy trying your hand with JAXB then the marshaller has a handy property for setting whether to format (use new lines and indent) the output or not.

JAXBContext jc = JAXBContext.newInstance(packageName);
Marshaller m = jc.createMarshaller();
m.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, Boolean.TRUE);
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
m.marshal(element, outputStream);

Quite an overhead to get to that stage though... perhaps a good option if you already have a solid xsd

Upvotes: 0

Related Questions