Reputation: 1585
I am trying to parse rss xml, but stuck in parsing the description, as my program stops parsing the description content when it encounter (').
Code to parse xml:
public class RSSAX {
String channel_title="";
public void displayRSS()
{
try {
SAXParserFactory spf = SAXParserFactory.newInstance();
SAXParser sp = spf.newSAXParser();
sp.parse("http://www.ronkaplansbaseballbookshelf.com/feed/podcast/", new RSSHandler());
} catch (Exception e) {
// TODO: handle exception
System.out.println("Messge is "+e.getMessage());
}
}
private class RSSHandler extends DefaultHandler
{
private boolean isItem = false;
private String tagName="";
@Override
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
this.tagName= qName;
if(qName.equals("item"))
{
this.isItem=true;
}
}
@Override
public void endElement(String uri, String localName, String qName)
throws SAXException {
this.tagName="";
if(qName.equals("item"))
{
System.out.println("========================");
this.isItem=false;
}
}
@Override
public void characters(char[] ch, int start, int length)
throws SAXException {
if(this.isItem)
{
//System.out.println("tagname is "+this.tagName);
if(this.tagName.equals("title"))
{
System.out.println("title is "+(new String(ch,start,length)));
this.tagName="";
}
else if(this.tagName.equals("link"))
{
System.out.println("link is "+(new String(ch,start,length)));
this.tagName="";
}
else if(this.tagName.equals("description"))
{
String test=(new String(ch,start,length)).replaceAll("\\<.*?>","");
test=StringEscapeUtils.escapeXml(StringEscapeUtils.unescapeXml(test));
System.out.println("description is "+test);
this.tagName="";
}
else if(this.tagName.equals("comments"))
{
System.out.println("comment link is "+(new String(ch,start,length)));
this.tagName="";
}
else if(this.tagName.equals("pubDate"))
{
System.out.println("pubDate is "+(new String(ch,start,length)));
this.tagName="";
}
else if(this.tagName.equals("category"))
{
System.out.println("Category is "+(new String(ch,start,length)));
this.tagName="";
}
else if(this.tagName.equals("content:encoded"))
{
System.out.println("content:encoded is "+(new String(ch,start,length)));
//this.tagName="";
}
}
}
}
Output:
title is The Bookshelf Conversation: Filip Bondy
link is http://www.ronkaplansbaseballbookshelf.com/2015/08/04/the-bookshelf-conversation-filip-bondy/
pubDate is Tue, 04 Aug 2015 14:31:45 +0000
comment link is http://www.ronkaplansbaseballbookshelf.com/2015/08/04/the-bookshelf-conversation-filip-bondy/#comments
Category is 2015 title
Category is Author profile/interview by Ron Kaplan
description is My New Jersey landsman and veteran sportswriter Filip Bondy has crafted a fun volume on one of the most famous games in the history of the national pastime. Whenever there
It stops parsing the description when it encounters there's..
Upvotes: 0
Views: 156
Reputation: 1585
You can use STAXParser, in this to force XMLStreamReader to return a single string, you can include:
factory.setProperty("javax.xml.stream.isCoalescing", true);
This helps to return as one string, refer XMLStreamReade.next() Documentation
Upvotes: 0
Reputation: 163262
A SAX parser can break up text nodes any way it likes, and deliver the content in multiple calls to the characters() method. It's your job to reassemble the pieces.
Upvotes: 1