Reputation: 595
First of all, I'm new to this java/android developement world, so bare over with me, I might ask some relative newbie'ish question :).
Anyway, I've been fizzling with this problem allmost all day now and I cannot figure out any solution by my self and I've search the web thin for ideas to bypass this problem.
I'm trying to develope an android app which parses data from an external XML file.
My parser looks like this:
public class NewSAXHandler implements ContentHandler
{
private String DEBUGTAG = "NewSAXHandler";
public static setNews news = null;
boolean currentElement = false;
String currentValue = null;
public static setNews getNews()
{
return news;
}
public static void setNewsList(setNews news)
{
NewSAXHandler.news = news;
}
@Override
public void startDocument() throws SAXException {
// TODO Auto-generated method stub
}
@Override
public void endDocument() throws SAXException {
// TODO Auto-generated method stub
}
@Override
public void startElement(String uri, String localName, String qname, Attributes attr) throws SAXException
{
currentElement = true;
if (localName.equalsIgnoreCase("channel"))
news = new setNews();
Log.d(DEBUGTAG, localName);
}
@Override
public void endElement(String uri, String localName, String qName) throws SAXException
{
if (localName.equalsIgnoreCase("title"))
{
news.setHeadline(currentValue);
Log.d(DEBUGTAG, localName);
Log.d(DEBUGTAG, currentValue);
}
else if (localName.equalsIgnoreCase("pubdate"))
{
news.setDate(currentValue);
Log.d(DEBUGTAG, localName);
Log.d(DEBUGTAG, currentValue);
}
}
@Override
public void characters(char[] ch, int start, int length) throws SAXException
{
if (currentElement)
{
currentValue = new String(ch, start, length).replaceAll("\\r\\n|\\r|\\n", " ");
currentElement = false;
}
}
@Override
public void ignorableWhitespace(char[] ch, int start, int length)throws SAXException
{
}
@Override
public void endPrefixMapping(String prefix) throws SAXException
{
}
@Override
public void processingInstruction(String target, String data)throws SAXException
{
}
@Override
public void setDocumentLocator(Locator locator)
{
}
@Override
public void skippedEntity(String name) throws SAXException
{
}
@Override
public void startPrefixMapping(String prefix, String uri)throws SAXException
{
}
}
And the XML file is parsed from:
http://www.hltv.org/news.rss.php
Here is the log when I run the app:
10-24 20:03:32.901: D/NewSAXHandler(975): rss
10-24 20:03:32.901: D/NewSAXHandler(975): channel
10-24 20:03:32.901: D/NewSAXHandler(975): title
10-24 20:03:32.901: D/NewSAXHandler(975): title
10-24 20:03:32.901: D/NewSAXHandler(975): www.HLTV.org News
10-24 20:03:32.901: D/NewSAXHandler(975): link
10-24 20:03:32.912: D/NewSAXHandler(975): description
10-24 20:03:32.912: D/NewSAXHandler(975): item
10-24 20:03:32.912: D/NewSAXHandler(975): title
10-24 20:03:32.912: D/NewSAXHandler(975): title
10-24 20:03:32.912: D/NewSAXHandler(975): http://www.hltv.org/HLTV.org News
10-24 20:03:32.912: D/NewSAXHandler(975): Photos: Final ones from ESWC
10-24 20:03:32.912: D/NewSAXHandler(975): link
10-24 20:03:32.912: D/NewSAXHandler(975): pubDate
10-24 20:03:32.922: D/NewSAXHandler(975): pubDate
10-24 20:03:32.922: D/NewSAXHandler(975): http://www.hltv.org/news/7692-photos-final-ones-from-eswcMon, 24 Oct 2011 21:17:00 +0200
10-24 20:03:32.922: D/NewSAXHandler(975): item
10-24 20:03:32.922: D/NewSAXHandler(975): title
10-24 20:03:32.932: W/System.err(975): org.apache.harmony.xml.ExpatParser$ParseException: At line 16, column 23: not well-formed (invalid token)
10-24 20:03:32.942: W/System.err(975): at org.apache.harmony.xml.ExpatParser.parseFragment(ExpatParser.java:520)
10-24 20:03:32.952: W/System.err(975): at org.apache.harmony.xml.ExpatParser.parseDocument(ExpatParser.java:479)
10-24 20:03:32.952: W/System.err(975): at org.apache.harmony.xml.ExpatReader.parse(ExpatReader.java:318)
10-24 20:03:32.952: W/System.err(975): at org.apache.harmony.xml.ExpatReader.parse(ExpatReader.java:275)
10-24 20:03:32.962: W/System.err(975): at jj.rssReader.hltvorg.Hltvorg.onCreate(Hltvorg.java:49)
10-24 20:03:32.962: W/System.err(975): at android.app.Instrumentation.callActivityOnCreate(Instrumentation.java:1047)
10-24 20:03:32.962: W/System.err(975): at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:1611)
10-24 20:03:32.971: W/System.err(975): at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:1663)
10-24 20:03:32.971: W/System.err(975): at android.app.ActivityThread.access$1500(ActivityThread.java:117)
10-24 20:03:32.981: W/System.err(975): at android.app.ActivityThread$H.handleMessage(ActivityThread.java:931)
10-24 20:03:32.981: W/System.err(975): at android.os.Handler.dispatchMessage(Handler.java:99)
10-24 20:03:32.981: W/System.err(975): at android.os.Looper.loop(Looper.java:123)
10-24 20:03:32.992: W/System.err(975): at android.app.ActivityThread.main(ActivityThread.java:3683)
10-24 20:03:32.992: W/System.err(975): at java.lang.reflect.Method.invokeNative(Native Method)
10-24 20:03:33.002: W/System.err(975): at java.lang.reflect.Method.invoke(Method.java:507)
10-24 20:03:33.002: W/System.err(975): at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:839)
10-24 20:03:33.002: W/System.err(975): at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:597)
10-24 20:03:33.013: W/System.err(975): at dalvik.system.NativeStart.main(Native Method)
It seems like the error is coming from the ´ character.
I cannot see the encoding since it's not in the XML file, but I guess it is UTF-8.
I've also tried using a StringBuilder to store each character without any luck.
I thought the XML parser would convert those special characters by itself, but it seems like it doesn't like em.
If I try to parse this file:
http://www.hltv.org/forum.rss.php
Then it works better.
Anyone got any new ideas?
**If you need anymore of my code, please say so :)
Best Regards,
Jesper
Upvotes: 0
Views: 2485
Reputation: 595
The problem was the encoding as said by Philipp above.
I've just added the follow to my code:
InputSource is = new InputSource(url.openStream());
is.setEncoding("ISO-8859-1");
Reader.parse(is);
Upvotes: 2