bladimir
bladimir

Reputation: 15

Android SAX parser not getting full text from CDATA

I'trying to read some feeds generated by wordpress website using SAX parser in Andriod app. I'm not getting full text in CDATA sections, text are splitted somewhere in the middle. I read that SAX parser sometimes splits text in chunks, but how to merge all this stuff in reader? Any help will be gracefull! Here is my RSSHandler.java and part of my xml from wordpress rss:

RSSHandler.java:

package com.example.vlada.vlada;

import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public class RSSHandler extends DefaultHandler {

    final int state_unknown = 0;
    final int state_title = 1;
    final int state_sadrzaj_posta = 2;
    final int state_link = 3;
    final int state_pubdate = 4;
    int currentState = state_unknown;

    RSSFeed feed;
    RSSItem item;

    boolean itemFound = false;

    RSSHandler(){
    }

    RSSFeed getFeed(){
        return feed;
    }

    @Override
    public void startDocument() throws SAXException {
        // TODO Auto-generated method stub
        feed = new RSSFeed();
        item = new RSSItem();

    }

    @Override
    public void endDocument() throws SAXException {
        // TODO Auto-generated method stub
    }

    @Override
    public void startElement(String uri, String localName, String qName,
            Attributes attributes) throws SAXException {
        // TODO Auto-generated method stub

        if (localName.equalsIgnoreCase("item")){
            itemFound = true;
            item = new RSSItem();
            currentState = state_unknown;
        }
        else if (localName.equalsIgnoreCase("title")){
            currentState = state_title;
        }
        else if (localName.equalsIgnoreCase("sadrzaj_posta")){
            currentState = state_sadrzaj_posta;
        }
        else if (localName.equalsIgnoreCase("link")){
            currentState = state_link;
        }
        else if (localName.equalsIgnoreCase("pubdate")){
            currentState = state_pubdate;
        }
        else{
            currentState = state_unknown;
        }

    }

    @Override
    public void endElement(String uri, String localName, String qName)
            throws SAXException {
        // TODO Auto-generated method stub
        if (localName.equalsIgnoreCase("item")){
            feed.addItem(item);
        }
    }



    @Override
    public void characters(char[] ch, int start, int length)
            throws SAXException {
        // TODO Auto-generated method stub

        String strCharacters = new String(ch,start,length);

        if (itemFound==true){
        // "item" tag found, it's item's parameter
            switch(currentState){
            case state_title:
                item.setTitle(strCharacters);
                break;
            case state_sadrzaj_posta:
                item.setSadrzaj_posta(strCharacters);
                break;
            case state_link:
                item.setLink(strCharacters);
                break;
            case state_pubdate:
                item.setPubdate(strCharacters);
                break;  
            default:
                break;
            }
        }
        else{
        // not "item" tag found, it's feed's parameter
            switch(currentState){
            case state_title:
                feed.setTitle(strCharacters);
                break;
            case state_sadrzaj_posta:
                feed.setSadrzajPosta(strCharacters);
                break;
            case state_link:
                feed.setLink(strCharacters);
                break;
            case state_pubdate:
                feed.setPubdate(strCharacters);
                break;  
            default:
                break;
            }
        }

        currentState = state_unknown;
    }


}

Part of xml:

<sadrzaj_posta>
<![CDATA[
<p>Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industrys standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.</p>
]]>
</sadrzaj_posta>

I'm getting just:

Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industrys standard dummy text ever since the 1500s, when an unknown printer took a galley of ty

Upvotes: 0

Views: 230

Answers (1)

CommonsWare
CommonsWare

Reputation: 1007554

characters() can be called many times within an element, to build up the whole text of text nodes. Do not attempt to consume the text in characters(). Instead, append them to a StringBuilder or something, then process the results in endElement() or some similar point where you know that you have all the text.

Upvotes: 1

Related Questions