Vitor Eiji
Vitor Eiji

Reputation: 243

How to partially pretty print XML files from the command line?

I'm writing a unix shell script where I need to pretty print XML files, but the catch is that there are portions of them that I may not touch. Namely, they're Apache Jelly scripts, which are contained within the XML files I need to pretty print. So I need to convert this

<proc source="customer"><scriptParam value="_user"/><scriptText><jelly:script>

  <jelly:log level="info">
    this text needs
      to keep its indent level
        and this is none of my business
  </jelly:log>

  <!-- get date -->
  <sql:query var="rs"><![CDATA[
    select sysdate
    from dual
  ]]></sql:query>

</jelly:script>
</scriptText></proc>

Into this

<proc source="customer">
  <scriptParam value="_user"/>
  <scriptText>
<jelly:script>

  <jelly:log level="info">
    this text needs
      to keep its indent level
        and this is none of my business
  </jelly:log>

  <!-- get date -->
  <sql:query var="rs"><![CDATA[
    select sysdate
    from dual
  ]]></sql:query>

</jelly:script>
  </scriptText>
</proc>

Notice that the only change to the jelly:script element is newline before it.

I couldn't find any option in xmllint or xmlstarlet to ignore a certain element. Is there any tool that can help me achieve this? I'm on Linux, if it matters.

Upvotes: 1

Views: 1183

Answers (1)

PBI
PBI

Reputation: 335

When requirement is that inside element jelly:script no spaces may change, then you can use xml_pp (on linux installed with the perl package perl-XML-Twig. The option -p some-element can be used to preserve all whitespace inside those elements:

xml_pp -p jelly:script  thefile.xml

That will create this:

<proc source="customer">
  <scriptParam value="_user"/>
  <scriptText>
    <jelly:script>

  <jelly:log level="info">
    this text needs
      to keep its indent level
        and this is none of my business
  </jelly:log>

  <!-- get date -->
  <sql:query var="rs"><![CDATA[
    select sysdate
    from dual
  ]]></sql:query>

</jelly:script>
  </scriptText>
</proc>

As you can see the start element <jelly:script> is also indented, because added spaces are still outside the element.

If that is also forbidden, then you must choose one level higher (scriptText), or maybe pipe it to a command that remove those spaces again:

xml_pp -p jelly:script thefile.xml | perl -pe 's/^\s*(<jelly:script>)/$1/'

Upvotes: 1

Related Questions