Jesper Jensen
Jesper Jensen

Reputation: 595

SAX XML Parser giving troubles with special characters

First of all, I'm new to this java/android developement world, so bare over with me, I might ask some relative newbie'ish question :).
Anyway, I've been fizzling with this problem allmost all day now and I cannot figure out any solution by my self and I've search the web thin for ideas to bypass this problem.

I'm trying to develope an android app which parses data from an external XML file.

My parser looks like this:



    public class NewSAXHandler implements ContentHandler
    {
        private String DEBUGTAG = "NewSAXHandler";

        public static setNews news = null;
        boolean currentElement = false;
        String currentValue = null;



        public static setNews getNews()
        {
            return news;
        }

        public static void setNewsList(setNews news)
        {
            NewSAXHandler.news = news;
        }

        @Override
        public void startDocument() throws SAXException {
         // TODO Auto-generated method stub
        }

        @Override
        public void endDocument() throws SAXException {
         // TODO Auto-generated method stub
        }       

        @Override
        public void startElement(String uri, String localName, String qname, Attributes attr) throws SAXException
        {
            currentElement = true;
            if (localName.equalsIgnoreCase("channel"))
                news = new setNews();
                Log.d(DEBUGTAG, localName);
        }

        @Override
        public void endElement(String uri, String localName, String qName) throws SAXException
        {
            if (localName.equalsIgnoreCase("title"))
            {
                news.setHeadline(currentValue);
                Log.d(DEBUGTAG, localName);
                Log.d(DEBUGTAG, currentValue);          
            }
            else if (localName.equalsIgnoreCase("pubdate"))
            {
                news.setDate(currentValue);
                Log.d(DEBUGTAG, localName);
                Log.d(DEBUGTAG, currentValue);          
            }
        }

        @Override
        public void characters(char[] ch, int start, int length) throws SAXException
        {   
            if (currentElement)
            {
                currentValue = new String(ch, start, length).replaceAll("\\r\\n|\\r|\\n", " ");
                currentElement = false;
            }
        }

        @Override
        public void ignorableWhitespace(char[] ch, int start, int length)throws SAXException
        {

        }

        @Override
        public void endPrefixMapping(String prefix) throws SAXException
        {

        }

        @Override
        public void processingInstruction(String target, String data)throws SAXException
        {

        }

        @Override
        public void setDocumentLocator(Locator locator)
        {

        }

        @Override
        public void skippedEntity(String name) throws SAXException
        {

        }

        @Override
        public void startPrefixMapping(String prefix, String uri)throws SAXException
        {

        }   
    } 

And the XML file is parsed from:

http://www.hltv.org/news.rss.php

Here is the log when I run the app:



    10-24 20:03:32.901: D/NewSAXHandler(975): rss
    10-24 20:03:32.901: D/NewSAXHandler(975): channel
    10-24 20:03:32.901: D/NewSAXHandler(975): title
    10-24 20:03:32.901: D/NewSAXHandler(975): title
    10-24 20:03:32.901: D/NewSAXHandler(975): www.HLTV.org News
    10-24 20:03:32.901: D/NewSAXHandler(975): link
    10-24 20:03:32.912: D/NewSAXHandler(975): description
    10-24 20:03:32.912: D/NewSAXHandler(975): item
    10-24 20:03:32.912: D/NewSAXHandler(975): title
    10-24 20:03:32.912: D/NewSAXHandler(975): title
    10-24 20:03:32.912: D/NewSAXHandler(975): http://www.hltv.org/HLTV.org News
    10-24 20:03:32.912: D/NewSAXHandler(975): Photos: Final ones from ESWC
    10-24 20:03:32.912: D/NewSAXHandler(975): link
    10-24 20:03:32.912: D/NewSAXHandler(975): pubDate
    10-24 20:03:32.922: D/NewSAXHandler(975): pubDate
    10-24 20:03:32.922: D/NewSAXHandler(975): http://www.hltv.org/news/7692-photos-final-ones-from-eswcMon, 24 Oct 2011 21:17:00 +0200
    10-24 20:03:32.922: D/NewSAXHandler(975): item
    10-24 20:03:32.922: D/NewSAXHandler(975): title
    10-24 20:03:32.932: W/System.err(975): org.apache.harmony.xml.ExpatParser$ParseException: At line 16, column 23: not well-formed (invalid token)
    10-24 20:03:32.942: W/System.err(975):  at org.apache.harmony.xml.ExpatParser.parseFragment(ExpatParser.java:520)
    10-24 20:03:32.952: W/System.err(975):  at org.apache.harmony.xml.ExpatParser.parseDocument(ExpatParser.java:479)
    10-24 20:03:32.952: W/System.err(975):  at org.apache.harmony.xml.ExpatReader.parse(ExpatReader.java:318)
    10-24 20:03:32.952: W/System.err(975):  at org.apache.harmony.xml.ExpatReader.parse(ExpatReader.java:275)
    10-24 20:03:32.962: W/System.err(975):  at jj.rssReader.hltvorg.Hltvorg.onCreate(Hltvorg.java:49)
    10-24 20:03:32.962: W/System.err(975):  at android.app.Instrumentation.callActivityOnCreate(Instrumentation.java:1047)
    10-24 20:03:32.962: W/System.err(975):  at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:1611)
    10-24 20:03:32.971: W/System.err(975):  at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:1663)
    10-24 20:03:32.971: W/System.err(975):  at android.app.ActivityThread.access$1500(ActivityThread.java:117)
    10-24 20:03:32.981: W/System.err(975):  at android.app.ActivityThread$H.handleMessage(ActivityThread.java:931)
    10-24 20:03:32.981: W/System.err(975):  at android.os.Handler.dispatchMessage(Handler.java:99)
    10-24 20:03:32.981: W/System.err(975):  at android.os.Looper.loop(Looper.java:123)
    10-24 20:03:32.992: W/System.err(975):  at android.app.ActivityThread.main(ActivityThread.java:3683)
    10-24 20:03:32.992: W/System.err(975):  at java.lang.reflect.Method.invokeNative(Native Method)
    10-24 20:03:33.002: W/System.err(975):  at java.lang.reflect.Method.invoke(Method.java:507)
    10-24 20:03:33.002: W/System.err(975):  at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:839)
    10-24 20:03:33.002: W/System.err(975):  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:597)
    10-24 20:03:33.013: W/System.err(975):  at dalvik.system.NativeStart.main(Native Method)

It seems like the error is coming from the ´ character.
I cannot see the encoding since it's not in the XML file, but I guess it is UTF-8.
I've also tried using a StringBuilder to store each character without any luck.

I thought the XML parser would convert those special characters by itself, but it seems like it doesn't like em.

If I try to parse this file:

http://www.hltv.org/forum.rss.php

Then it works better.

Anyone got any new ideas?

**If you need anymore of my code, please say so :)

Best Regards,
Jesper

Upvotes: 0

Views: 2485

Answers (1)

Jesper Jensen
Jesper Jensen

Reputation: 595

The problem was the encoding as said by Philipp above.

I've just added the follow to my code:

InputSource is = new InputSource(url.openStream());
is.setEncoding("ISO-8859-1");
Reader.parse(is);

Upvotes: 2

Related Questions