Mukul Aggarwal
Mukul Aggarwal

Reputation: 1585

xml parsing using SAX parser in java

I am trying to parse rss xml, but stuck in parsing the description, as my program stops parsing the description content when it encounter (').

Code to parse xml:

public class RSSAX {

String channel_title="";

public void displayRSS()
{

    try {

        SAXParserFactory spf =  SAXParserFactory.newInstance();
        SAXParser sp = spf.newSAXParser();
        sp.parse("http://www.ronkaplansbaseballbookshelf.com/feed/podcast/", new RSSHandler());


    } catch (Exception e) {
        // TODO: handle exception
        System.out.println("Messge is "+e.getMessage());
    }

}

private class RSSHandler extends DefaultHandler
{
    private boolean isItem = false;
    private String tagName=""; 

    @Override
    public void startElement(String uri, String localName, String qName,
            Attributes attributes) throws SAXException {
        this.tagName= qName;
        if(qName.equals("item"))
        {
            this.isItem=true;
        }

    }

    @Override
    public void endElement(String uri, String localName, String qName)
            throws SAXException {
         this.tagName="";
         if(qName.equals("item"))
         {
             System.out.println("========================");
             this.isItem=false;
         }


    }

    @Override
    public void characters(char[] ch, int start, int length)
            throws SAXException {

        if(this.isItem)
        {
            //System.out.println("tagname is "+this.tagName);
            if(this.tagName.equals("title"))
            {
                System.out.println("title is "+(new String(ch,start,length)));
                this.tagName="";
            }
            else if(this.tagName.equals("link"))
            {
                System.out.println("link is "+(new String(ch,start,length)));
                this.tagName="";
            }
            else if(this.tagName.equals("description"))
            {
                String test=(new String(ch,start,length)).replaceAll("\\<.*?>","");
                test=StringEscapeUtils.escapeXml(StringEscapeUtils.unescapeXml(test));
                System.out.println("description is "+test);
                this.tagName="";
            }
            else if(this.tagName.equals("comments"))
            {
                System.out.println("comment link is "+(new String(ch,start,length)));
                this.tagName="";
            }
            else if(this.tagName.equals("pubDate"))
            {
                System.out.println("pubDate is "+(new String(ch,start,length)));
                this.tagName="";
            }
            else if(this.tagName.equals("category"))
            {
                System.out.println("Category is "+(new String(ch,start,length)));
                this.tagName="";
            }
            else if(this.tagName.equals("content:encoded"))
            {
                System.out.println("content:encoded is "+(new String(ch,start,length)));
                //this.tagName="";
            }

        }

    }

}



Output:

title is The Bookshelf Conversation: Filip Bondy
link is http://www.ronkaplansbaseballbookshelf.com/2015/08/04/the-bookshelf-conversation-filip-bondy/
pubDate is Tue, 04 Aug 2015 14:31:45 +0000
comment link is http://www.ronkaplansbaseballbookshelf.com/2015/08/04/the-bookshelf-conversation-filip-bondy/#comments
Category is 2015 title Category is Author profile/interview by Ron Kaplan

description is My New Jersey landsman and veteran sportswriter Filip Bondy has crafted a fun volume on one of the most famous games in the history of the national pastime. Whenever there

It stops parsing the description when it encounters there's..

Upvotes: 0

Views: 156

Answers (2)

Mukul Aggarwal
Mukul Aggarwal

Reputation: 1585

You can use STAXParser, in this to force XMLStreamReader to return a single string, you can include:

factory.setProperty("javax.xml.stream.isCoalescing", true);

This helps to return as one string, refer XMLStreamReade.next() Documentation

Upvotes: 0

Michael Kay
Michael Kay

Reputation: 163262

A SAX parser can break up text nodes any way it likes, and deliver the content in multiple calls to the characters() method. It's your job to reassemble the pieces.

Upvotes: 1

Related Questions