hakki
hakki

Reputation: 6521

SAXParser - Handle tags with same text at different level in XML structure

I want to get some values from a news site with SAXParser. But its' structure is hard to me, I am new at XML and SAX.

Issue: News Site using SAME TAG NAME for site name and news title for its XML.

When I run Java Code It is working without error but problem is about outputs.

How can I only get <item> tag's child tag: <title> ? I don't want to show site title on my application. It is big issue for me.

XML Side

<channel>

   <title>Site Name</title>

   <item>  
       <title>News Title!</title>       
   </item>

</channel>

Java Side

There is no error in java file :)

try {

            SAXParserFactory factory = SAXParserFactory.newInstance();
            SAXParser saxParser = factory.newSAXParser();

            DefaultHandler handler = new DefaultHandler() {

                boolean newsTitle   = false;


                public void startElement(String uri, String localName,
                        String qName, Attributes attributes)
                        throws SAXException {

                    //System.out.println("Start Element :" + qName);

                    if (qName.equalsIgnoreCase("title")) {
                        newsTitle = true;
                    }

                }

                public void endElement(String uri, String localName,
                        String qName) throws SAXException {

                    //System.out.println("End Element :" + qName);

                }

                public void characters(char ch[], int start, int length)
                        throws SAXException {

                    if (newsTitle) {
                        System.out.println("Title : "
                                + new String(ch, start, length));
                        newsTitle = false;
                    }

                }

            };

            saxParser
                    .parse("C:\\ntv.xml",handler);

        }
        catch (Exception e) {
            e.printStackTrace();
        }

OUTPUT:

Title : Site Name

Title : News Title

Upvotes: 2

Views: 2046

Answers (2)

sgp15
sgp15

Reputation: 1280

You can use XPath rather than parsing your XML using SAX.

XPath expression for your case is:

/channel/item/title

Example code:

import org.xml.sax.InputSource;

import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import java.io.StringReader;

public class XPathTest {

    public static void main(String[] args) throws XPathExpressionException {

        String xml = "<channel>\n" +
                "\n" +
                "   <title>Site Name</title>\n" +
                "\n" +
                "   <item>  \n" +
                "       <title>News Title!</title>       \n" +
                "   </item>\n" +
                "\n" +
                "</channel>";

        Object result = XPathFactory.newInstance().newXPath().compile("/channel/item/title").evaluate(new InputSource(new StringReader(xml)));
        System.out.print(result);
    }
}

Upvotes: 1

Nathan Hughes
Nathan Hughes

Reputation: 96385

You can add a stack to your DefaultHandler. When you find a tag in startElement push the tag onto the stack, then in endElement pop the topmost tag off the stack. When you want to know where you are in the document, check if the stack contains /title/item/title or just /title.

Use the localName instead of the qName if you don't care about namespaces. The qName may have a namespace prepended to it.

Also the way you're using the characters method is not correct (which is a common problem), see the explanation in the SAX tutorial.

Upvotes: 1

Related Questions