Reputation: 15
I'trying to read some feeds generated by wordpress website using SAX parser in Andriod app. I'm not getting full text in CDATA sections, text are splitted somewhere in the middle. I read that SAX parser sometimes splits text in chunks, but how to merge all this stuff in reader? Any help will be gracefull! Here is my RSSHandler.java and part of my xml from wordpress rss:
RSSHandler.java:
package com.example.vlada.vlada;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
public class RSSHandler extends DefaultHandler {
final int state_unknown = 0;
final int state_title = 1;
final int state_sadrzaj_posta = 2;
final int state_link = 3;
final int state_pubdate = 4;
int currentState = state_unknown;
RSSFeed feed;
RSSItem item;
boolean itemFound = false;
RSSHandler(){
}
RSSFeed getFeed(){
return feed;
}
@Override
public void startDocument() throws SAXException {
// TODO Auto-generated method stub
feed = new RSSFeed();
item = new RSSItem();
}
@Override
public void endDocument() throws SAXException {
// TODO Auto-generated method stub
}
@Override
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
// TODO Auto-generated method stub
if (localName.equalsIgnoreCase("item")){
itemFound = true;
item = new RSSItem();
currentState = state_unknown;
}
else if (localName.equalsIgnoreCase("title")){
currentState = state_title;
}
else if (localName.equalsIgnoreCase("sadrzaj_posta")){
currentState = state_sadrzaj_posta;
}
else if (localName.equalsIgnoreCase("link")){
currentState = state_link;
}
else if (localName.equalsIgnoreCase("pubdate")){
currentState = state_pubdate;
}
else{
currentState = state_unknown;
}
}
@Override
public void endElement(String uri, String localName, String qName)
throws SAXException {
// TODO Auto-generated method stub
if (localName.equalsIgnoreCase("item")){
feed.addItem(item);
}
}
@Override
public void characters(char[] ch, int start, int length)
throws SAXException {
// TODO Auto-generated method stub
String strCharacters = new String(ch,start,length);
if (itemFound==true){
// "item" tag found, it's item's parameter
switch(currentState){
case state_title:
item.setTitle(strCharacters);
break;
case state_sadrzaj_posta:
item.setSadrzaj_posta(strCharacters);
break;
case state_link:
item.setLink(strCharacters);
break;
case state_pubdate:
item.setPubdate(strCharacters);
break;
default:
break;
}
}
else{
// not "item" tag found, it's feed's parameter
switch(currentState){
case state_title:
feed.setTitle(strCharacters);
break;
case state_sadrzaj_posta:
feed.setSadrzajPosta(strCharacters);
break;
case state_link:
feed.setLink(strCharacters);
break;
case state_pubdate:
feed.setPubdate(strCharacters);
break;
default:
break;
}
}
currentState = state_unknown;
}
}
Part of xml:
<sadrzaj_posta>
<![CDATA[
<p>Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industrys standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.</p>
]]>
</sadrzaj_posta>
I'm getting just:
Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industrys standard dummy text ever since the 1500s, when an unknown printer took a galley of ty
Upvotes: 0
Views: 230
Reputation: 1007554
characters()
can be called many times within an element, to build up the whole text of text nodes. Do not attempt to consume the text in characters()
. Instead, append them to a StringBuilder
or something, then process the results in endElement()
or some similar point where you know that you have all the text.
Upvotes: 1