user1015721
user1015721

Reputation: 149

Parsing XML for deeply nested data

I have an XML file that is structured something like this:

<element1>
    <element2>
        <element3>
            <elementIAmInterestedIn attribute="data">
                <element4>
                    <element5>
                        <element6>
                            <otherElementIAmInterestedIn>
                                <data1>text1</data1>
                                <data2>text2</data2>
                                <data3>text3</data3>
                            </otherElementIAmInterestedIn>
                        </element6>
                    </element5>
                </element4>
            </elementIAmInterestedIn>
            <elementIAmInterestedIn attribute="data">
                <element4>
                    <element5>
                        <element6>
                            <otherElementIAmInterestedIn>
                                <data1>text1</data1>
                                <data2>text2</data2>
                                <data3>text3</data3>
                            </otherElementIAmInterestedIn>
                        </element6>
                    </element5>
                </element4>
            </elementIAmInterestedIn>
            <elementIAmInterestedIn attribute="data">
                <element4>
                    <element5>
                        <element6>
                            <otherElementIAmInterestedIn>
                                <data1>text1</data1>
                                <data2>text2</data2>
                                <data3>text3</data3>
                            </otherElementIAmInterestedIn>
                        </element6>
                    </element5>
                </element4>
            </elementIAmInterestedIn>
        </element3>
    </element2>
</element1>

As you can see, I am interested in two elements, the first of which is deeply nested within the root element, and the second of which is deeply nested within that first element. There are multiple (sibling) elementIAmInterestedIn and otherElementIAmInterestedIn elements in the document.

I want to parse this XML file with Java and put the data from all the elementIAmInterestedIn and otherElementIAmInterestedIn elements into either a data structure or Java objects - it doesn't matter much to me as long as it is organized and I can access it later.

I'm able to write a recursive DOM parser method that does a depth-first traversal of the XML so that it touches every element. I also wrote a Java class with JAXB annotations that represents elementIAmInterestedIn. Then, in the recursive method, I can check when I get to an elementIAmInterestedIn and unmarshal it into an instance of the JAXB class. This works fine except that such an object should also contain multiple otherElementIAmInterestedIn.

This is where I'm stuck. How can I get the data out of otherElementIAmInterestedIn and assign it to the JAXB object? I've seen the @XmlWrapper annotation, but this seems to only work for one layer of nesting. Also, I cannot use @XmlPath.

Maybe I should scratch that idea and use a whole new approach. I'm really just getting started with XML parsing so perhaps I'm overlooking a more obvious solution. How would you parse an XML document structured like this and store the data in an organized way?

Upvotes: 1

Views: 2291

Answers (1)

ejoncas
ejoncas

Reputation: 329

Maybe you should use SAX parser instead of DOM. When you use DOM you are loading all the document in memory and in your case you only want to read 2 fields. This is quite inefficient.

Using sax parser you'll be able to read only those nodes that you are interested in. Here is a pseudocode for your task using a SAX parsing model:

1) Keep reading nodes until you get <elementInterestedIn> node

2) Grab that field in your class

3) Keep on reading until you get <otherElementInterestedIn> node

4) Grab that field too and save the object.

Loop from 1 to 4 until it reachs the end of document.

If you try this aproach, i suggest you first reading this document to understand how SAX parser works, it's very different from DOM aproach: How to Use SAX

Upvotes: 2

Related Questions