PavanMysore
PavanMysore

Reputation: 189

SAXParser Failing for a specific data

I am trying to parse an XML file that looks like this:

<?xml version="1.0" encoding="utf-8"?>
<downloaddata>
    <downloaditem itemid="1">
    <title>Abdul kalaam Inspirational Talk</title>
    <downloadlink>http://o-o.preferred.spectranet-blr1.v8.lscache4.c.youtube.com/videoplayback?upn=Rxb-DvFeBTE&sparams=cp%2Cid%2Cip%2Cipbits%2Citag%2Cratebypass%2Csource%2Cupn%2Cexpire&fexp=906512%2C907217%2C907335%2C921602%2C919306%2C919316%2C904455%2C919324%2C904452&itag=18&ip=203.0.0.0&signature=96D7FA17DF684B4C2CD30F12251F3263C83EC443.05F62E98E1059BB44459ABF319F50DC4B7E6D90E&sver=3&ratebypass=yes&source=youtube&expire=1337691481&key=yt1&ipbits=8&cp=U0hSTFZUT19NS0NOMl9OTlNFOmlwaTFSSGFfd3NK&id=67ffa1d50864f57d&title=Abdul%20Kalam%20inspirational%20Speech%20on%20Leadership%20and%20Motivation</downloadlink>
    </downloaditem>
</downloaddata>

It seems that the parsing is failing when the data for the downloadlink tag is as above. I have tried to replace the data with something else of the same length, and it works.

Below is the android code I am using.

import java.io.File;
import java.io.IOException;
import java.util.List;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;

import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

import android.os.Environment;

public class Wilxmlparser extends DefaultHandler{

List<VideoDetails> downloadList;
private String tempVal;
private VideoDetails tempVidDet;

public Wilxmlparser(){

}

public void parseXML() {

//get a factory
SAXParserFactory spf = SAXParserFactory.newInstance();
try {

    //get a new instance of parser
    SAXParser sp = spf.newSAXParser();

    File downloadInfo =new         File(Environment.getExternalStorageDirectory()+"/watchitlater/config/downloadinfo1.xml");        
    //parse the file and also register this class for call backs
    sp.parse(downloadInfo, this);

}catch(SAXException se) {
    se.printStackTrace();
}catch(ParserConfigurationException pce) {
    pce.printStackTrace();
}catch (IOException ie) {
    ie.printStackTrace();
}
}


//Event Handlers
@Override
public void startElement(String uri, String localName, String qName, Attributes     attributes) throws SAXException {
//reset
tempVal = "";
if(qName.equalsIgnoreCase("downloaditem")) {
    tempVidDet = new VideoDetails();
    tempVidDet.setItemId(Integer.parseInt(attributes.getValue("itemid")));
    }
}

@Override
public void characters(char[] ch, int start, int length) throws SAXException {
tempVal = new String(ch,start,length);
}

@Override
public void endElement(String uri, String localName, String qName) throws SAXException                 {

if(qName.equalsIgnoreCase("downloaditem")) {
downloadList.add(tempVidDet);
}else if (qName.equalsIgnoreCase("title")) {
    tempVidDet.setTitle(tempVal);
}else if (qName.equalsIgnoreCase("downloadlink")) {
    tempVidDet.setDownloadLink(tempVal);        
    }
}
}

The above code does not give a callback to endElement for the above xml file. however if the xml were to be like

<?xml version="1.0" encoding="utf-8"?>
<downloaddata>
    <downloaditem itemid="1">
        <title>Abdul kalaam Inspirational Talk</title>
        <downloadlink>http://www.gmail.com/hello/world/sdfsdf%20.@@%!@#    ($dwe</downloadlink>
    </downloaditem>
</downloaddata>

or

<?xml version="1.0" encoding="utf-8"?>
<downloaddata>
    <downloaditem itemid="1">
        <title>Abdul kalaam Inspirational Talk</title>
            <downloadlink>httphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttphttpa</downloadlink>
    </downloaditem>
</downloaddata>

Then it works fine. What am I doing wrong?

Upvotes: 3

Views: 565

Answers (2)

Don Roby
Don Roby

Reputation: 41137

The reason your parser cannot parse the xml in question is that it is invalid xml. The section of data that is causing your problem has characters that must be escaped. See Characters and escaping in the wikipedia article on XML for further info.

This is best corrected in whatever is producing the xml, and the easiest fix would be to wrap the offending text in a CDATA section.

Once the data is fixed, you may also see an issue caused by a misconception in your parsing code however.

@Override
public void characters(char[] ch, int start, int length) throws SAXException {
   tempVal = new String(ch,start,length);
}

will not always get all the characters between start and end tags, as the contract for this method allows it to be called more than once. Instead of simply copying into a string, you need to append to a string buffer that is initialized in the startElement method and used in the endElement method.

See my answer to another SO question for a bit more on this characters method parsing issue.

Upvotes: 1

Related Questions