saba
saba

Reputation: 539

How to Parse unknown XML structure using STAX Event Driven Or Stream API

I try to parse unknown xml structure using DOM and get success but now I try to use STAX event or stream parser because of large xml file.Though I do this using SAX and I get success.But now I am little bit curious on STAX.Now I really want to learn about it.

I do some research on that and write this code

This is for STAX streaming

public static void main(String args[]) throws XMLStreamException, FileNotFoundException {
    XMLInputFactory xf = XMLInputFactory.newInstance();

    XMLStreamReader xsr = xf.createXMLStreamReader(new InputStreamReader(new FileInputStream("c:\\file.xml")));
    XMLInputFactoryImpl x = new XMLInputFactoryImpl();
    while (xsr.hasNext()) {

        int e = xsr.next();

        if (e == XMLStreamConstants.START_ELEMENT) {
            System.out.println("Element Start Name:" + xsr.getLocalName());
        }
        if (e == XMLStreamReader.END_ELEMENT) {
            System.out.println("Element End Name:" + xsr.getLocalName());
        }
        if (e == XMLStreamConstants.CHARACTERS) {
            System.out.println("Element Text:" + xsr.getText());
        }
    }
}

And STAX Event driven

   public static void main(String[] args) throws XMLStreamException, FileNotFoundException {
        // TODO code application logic here
        // TODO Auto-generated method stub

        XMLInputFactory xif = XMLInputFactory.newInstance();
        XMLEventReader xer = xif.createXMLEventReader(new InputStreamReader(new FileInputStream("c:\\file.xml")));

        while (xer.hasNext()) {

            XMLEvent e = xer.nextEvent();
            if (e.isCharacters()) {
                System.out.println("Element Text : "+e.asCharacters().getData());
            }
            if (e.isStartElement()) {
                System.out.println("Start Element : "+e.asStartElement().getName());
            }
            if (e.isEndElement()) {
                System.out.println("End Element : "+e.asEndElement().getName());
            }
        }
    }

}

In above two code Parent node also print the blank text but it should not because in xml child node only contains text and it should only print the child node text. for example

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<student id="1">
  <fname>TestFirstName</fname>
  <lname>TestLastName</lname>
  <sectionname rollno="1">A</sectionname>
</student>

It should print TestFirstName,TestLastName etc means it should not return true this lines if (e == XMLStreamConstants.CHARACTERS) or if (e.isCharacters()) for parent nodes to print characters. So how can I modify my code to parse any level of xml file it may be on any depth or any cascading level.

Upvotes: 0

Views: 1479

Answers (2)

saba
saba

Reputation: 539

This is my solution using STAX Stream

public static void main(String[] args) throws FileNotFoundException, XMLStreamException {
    XMLInputFactory xf=XMLInputFactory.newInstance();
    XMLStreamReader xsr=xf.createXMLStreamReader(new InputStreamReader(new FileInputStream("c:\\test.xml")));
    String startElement = null;
    String endElement  =null;
    String elementTxt = null;
    while (xsr.hasNext()) {
        int e = xsr.next();
        if(e==XMLStreamConstants.START_ELEMENT){
            //System.out.println("StartElement Name :" + xsr.getLocalName());
            startElement = xsr.getLocalName();
        }
        if(e==XMLStreamConstants.END_ELEMENT){
            //System.out.println("EndElement Name :" + xsr.getLocalName());
            endElement = xsr.getLocalName();
            if(startElement.equalsIgnoreCase(endElement))
            System.out.println(" ElementName : "+ startElement + " ElementText : " + elementTxt);
        }
        if(e==XMLStreamConstants.CHARACTERS){
            //System.out.println("Element TextValue :" + xsr.getText());
            elementTxt = (xsr.getText().contains("\n")) ? "" : xsr.getText();
        }

    }
}

This is my solution using STAX Event

public static void main(String[] args) throws XMLStreamException,FileNotFoundException {
    // TODO code application logic here
    // TODO Auto-generated method stub

    XMLInputFactory xif = XMLInputFactory.newInstance();
    XMLEventReader xer = xif.createXMLEventReader(new InputStreamReader(new FileInputStream("c:\\test.xml")));
    String startElement = null;
    String endElement = null;
    String elementTxt = null;
    while (xer.hasNext()) {

        XMLEvent e = xer.nextEvent();
        if (e.isCharacters()) {
            elementTxt = (e.asCharacters().getData().contains("\n")) ? "": e.asCharacters().getData();
        }
        if (e.isStartElement()) {
            // System.out.println("Start Element : "+e.asStartElement().getName());
            startElement = e.asStartElement().getName().toString();
        }
        if (e.isEndElement()) {
            // System.out.println("End Element : "+e.asEndElement().getName());
            endElement = e.asEndElement().getName().toString();
            if (startElement.equalsIgnoreCase(endElement))
                System.out.println(" ElementName : " + startElement + " ElementText : " + elementTxt);
        }
    }
}

Upvotes: 1

csauvanet
csauvanet

Reputation: 554

The event parsing sequence is correct, you have calls to empty characters because there is the pretty-print formatting (spaces or tabs). If your XML were in-lined (flat) you would not have these additional events.

From StAX documentation you can see that "ignorable whitespace and significant whitespace are also reported as Character events." : you just need to get rid of the whitespaces. Do do so you can add test !e.asCharacters().isWhiteSpace():

XMLEvent e = xer.nextEvent();
if (e.isCharacters() && !e.asCharacters().isWhiteSpace()) {
    System.out.println("Element Text : "+e.asCharacters().getData());
}

That should filter out the blank spaces and you will have only your expected events.

Upvotes: 1

Related Questions