SGB
SGB

Reputation: 2260

how to remove empty tags in input xml

My java module gets a huge input xml from a mainframe. Unfortunately, the mainframe is unable to skip optional elements, with the result that I get a LOT of empty tags in my input :

So,

<SSN>111111111</SSN>
<Employment>
<Current>
<Address>
<line1/>
<line2/>
<line3/>
<city/>
<state/>
<country/>
</Address>
<Phone>
<phonenumber/>
<countryCode/>
</Phone>
</Current>
<Previous>
<Address>
<line1/>
<line2/>
<line3/>
<city/>
<state/>
<country/>    
</Address>
<Phone>
<phonenumber/>
<countryCode/>
</Phone>
</Previous>
</Employment>
<MaritalStatus>Single</MaritalStatus>

should be:

<SSN>111111111</SSN>
<MaritalStatus>SINGLE</MaritalStatus>

I use jaxb to unmarshall the input xml string that the mainframe sends it. Is there a clean/ easy way to remove all the empty group tags, or do I have to do this manuall in the code for each element. I have over 350 elements in my input xml, so I would love to it if jaxb itself had a way of doing this automatically?

Thanks, SGB

Upvotes: 5

Views: 6538

Answers (5)

St&#233;phane GRILLON
St&#233;phane GRILLON

Reputation: 11864

 public static void main(String[] args) {

    final String regex1 = "<([a-zA-Z][a-zA-Z0-9]*)[^>]*/>";
    final String regex2 = "<([a-zA-Z][a-zA-Z0-9]*)[^>]*>\\s*</\\1>";

    String xmlString = "<SSN>111111111</SSN><Employment><Current><Address><line1/><line2/><line3/><city/><state/><country/></Address><Phone><phonenumber/><countryCode/></Phone></Current><Previous><Address><line1/><line2/><line3/><city/><state/><country/>    </Address><Phone><phonenumber/><countryCode/></Phone></Previous></Employment><MaritalStatus>Single</MaritalStatus>";
    System.out.println(xmlString);

    final Pattern pattern1 = Pattern.compile(regex1);
    final Pattern pattern2 = Pattern.compile(regex2);

    Matcher matcher1;
    Matcher matcher2;
    do {
        matcher1 = pattern1.matcher(xmlString);
        matcher2 = pattern2.matcher(xmlString);
        xmlString = xmlString.replaceAll(regex1, "").replaceAll(regex2, "");
    } while (matcher1.find() || matcher2.find());

    System.out.println(xmlString);
}

Console:

<SSN>111111111</SSN>
<Employment>
    <Current>
        <Address>
            <line1/>
            <line2/>
            <line3/>
            <city/>
            <state/>
            <country/>
        </Address>
        <Phone>
            <phonenumber/>
            <countryCode/>
        </Phone>
    </Current>
    <Previous>
        <Address>
            <line1/>
            <line2/>
            <line3/>
            <city/>
            <state/>
            <country/>
        </Address>
        <Phone>
            <phonenumber/>
            <countryCode/>
        </Phone>
    </Previous>
</Employment>
<MaritalStatus>Single</MaritalStatus>

<SSN>111111111</SSN>
<MaritalStatus>Single</MaritalStatus>

Online demo here

Upvotes: 1

mkl
mkl

Reputation: 11

Ok, accasionally stepped in here. Simple working solution with jaxb (at least for jdk 1.6.x):

set the unwanted Attribute or Element null! e.g. ...setEmployment(null); then the whole Employment structure is gone.

Cheers Masi

Upvotes: 1

skaffman
skaffman

Reputation: 403481

The only technique I'm aware of in JAXB to do this is by writing a custom XmlAdapter which collapses your empty strings to nulls.

The downside is that you'd have to add this as an annotation to every single element in your code, and if you have 350 of them, that's going to be tedious.

Upvotes: 1

Michael
Michael

Reputation: 35341

I think you'd have to edit your mainframe code for the best solution. When your mainframe generates the XML, you'll have to tell it not to output a tag if it's empty.

There's not much you can do on the client side I don't think. If the XML that you get is filled with empty tags, then you have no choice but to parse them all--after all, how can you tell if a tag is empty without parsing it in some way!

But maybe you could do a regex string replace on the XML text before JAX-B gets to it:

String xml = //get the XML
xml = xml.replaceAll("<.*?/>", "");

This will remove empty tags like "<city/>" but not "<Address></Address>".

Upvotes: 4

blissapp
blissapp

Reputation: 1370

You could preprocess using XSLT. I know it's considered a bit "Disco" nowadays, but it is fast and easy to apply.

From this tek-tips discussion, you could transform with XSLT to remove empty elements.

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="@*|node()">
    <xsl:if test=". != '' or ./@* != ''">
      <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
      </xsl:copy>
    </xsl:if>
  </xsl:template>
</xsl:stylesheet>

Upvotes: 5

Related Questions