Reputation: 1782

Suggestion to parse this XML in Java

Not new to Java; but relatively new to XML-parsing. I know a tiny bit about a lot of the XML tools out there, but not much about any of them. I am also not an XML-pro.

My particular problem is this... I have been given an XML-document which I cannot modify and from which I need only to parse random bits of it into Java objects. Sheer speed is not much of a factor so long as it's reasonable. Likewise, memory-footprint need not be absolutely optimal either, just not insane. I only need to read through the document one time to parse it, after that I'll be throwing it in the bitbucket and just using my POJO.

So, I'm open to suggestion... which tool would you use?
And, would you kindly suggest a bit of starter-code to address my particular need?

Here's a snippet of sample XML and the associated POJO I'm trying to craft:

<xml>
  <item id="...">
    ...
  </item>
  <metadata>
    <resources>

      <resource>
        <ittype>Service_Links</ittype>
        <links>
          <link>
            <path>http://www.stackoverflow.com</path>
            <description>Stack Overflow</description>
          </link>
          <link>
            <path>http://www.google.com</path>
            <description>Google</description>
          </link>
        </links>
      </resource>

      <resource>
        <ittype>Article_Links</ittype>
        <links>
          ...
        </links>
      </resource>

      ...

    </resources>
  </metadata>
</xml>


public class MyPojo {

    @Attribute(name="id")
    @Path("item")
    public String id;

    @ElementList(entry="link")
    @Path("metadata/resources/resource/links")
    public List<Link> links;
}

NOTE: this question was originally spawned by this question with me trying to solve it using SimpleXml; I'm to the point where I thought maybe someone could suggest a different route to solving the same problem.

Also Note: I'm really hoping for a CLEAN solution... by which I mean, using annotations and/or xpath with the least amount of code... the last thing I want is huge class file with huge unwieldy methods... THAT, I already have... I'm trying to find a better way.

Upvotes: 1

Answers (3)

Bane

Reputation: 1782

OK, so I settled on a solution that (to me) seemed to address my needs in the most reasonable way. My apologies to the other suggestions, but I just liked this route better because it kept most of the parsing-rules as annotations and what little procedural-code I had to write was very minimal.

I ended up going with JAXB; initially I thought JAXB would either create XML from a Java-class or parse XML into a Java-class but only with an XSD. Then I discovered that JAXB has annotations that can parse XML into a Java-class without an XSD.

The XML-file I'm working with is huge and very deep, but I only need bits and bites of it here and there; I was worried that navigating what maps to where in the future would be very difficult. So I chose to structure a tree of folders modeled after the XML... each folder maps to an element and in each folder is a POJO representing that actual element.

Problem is, sometimes there is an element who has a child-element several levels down which has a single property I care about. It would be a pain to create 4 nested-folders and a POJO for each just to get access to a single property. But that's how you do it with JAXB (at least, from what I can tell); once again I was in a corner.

Then I stumbled on EclipseLink's JAXB-implementation: Moxy. Moxy has an @XPath annotation that I could place in that parent POJO and use to navigate several levels down to get access to a single property without creating all those folders and element-POJOs. Nice.

So I created something like this: (note: I chose to use getters for cases where I need to massage the value)

// maps to the root-"xml" element in the file
@XmlRootElement( name="xml" )
@XmlAccessorType( XmlAccessType.FIELD )
public class Xml {

    // this is standard JAXB
    @XmlElement;               
    private Item item;
    public Item getItem() {    
        return this.item;
    }

    ...
}

// maps to the "<xml><item>"-element in the file
public class Item {

    // standard JAXB; maps to "<xml><item id="...">"
    @XmlAttribute              
    private String id;
    public String getId() {
        return this.id;
    }

    // getting an attribute buried deep down
    // MOXY; maps to "<xml><item><rating average="...">"
    @XmlPath( "rating/@average" )    
    private Double averageRating;
    public Double getAverageRating() {
        return this.average;
    }

    // getting a list buried deep down
    // MOXY; maps to "<xml><item><service><identification><aliases><alias.../><alias.../>"
    @XmlPath( "service/identification/aliases/alias/text()" )
    private List<String> aliases;
    public List<String> getAliases() {
        return this.aliases;
    }

    // using a getter to massage the value
    @XmlElement(name="dateforindex")
    private String dateForIndex;
    public Date getDateForIndex() {
        // logic to parse the string-value into a Date
    }

}

Also note that I took the route of separating the XML-object from the model-object I actually use in the app. Thus, I have a factory that transforms these crude objects into much more robust objects which I actually use in my app.

Upvotes: 1

Sumit Desai

Reputation: 1760

You can use SAXParser or STAXParser. If you can afford some more amount of memory, then you can also afford to use DOMParser. I would advise STAXParser would be best for you.

Upvotes: 0

Neil Coffey

Reputation: 21795

If your XML documents are relatively small (as appears to be the case here), I would use the DOM framework and XPath class. Here is some boilerplate DOM/XPath code from one of my tutorials:

File xmlFile = ...
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(xmlFile);

XPath xp = XPathFactory.newInstance().newXPath();
String value = xp.evaluate("/path/to/element/text()", doc);
// .. reuse xp to get other values as required

In other words, basically you:

get your XML into a Document object, via a DocumentBuilder;
create an XPath object;
repeatedly call XPath.evaluate(), passing in the path of the element(s) required and your Document.

As you see, there's a little bit of fiddliness in getting hold of your Document object and like all good XML APIs, it throws a plethora of silly pointless checked exceptions. But apart from that, it's fairly no-nonsense for parsing simple small to medium XML documents whose structure is relatively fixed.

Upvotes: 0

Suggestion to parse this XML in Java

Answers (3)

Related Questions