Reputation: 12587
So, I've been working on parsing a XML file getting from the internet (RSS).
I've been working according to IBM's parsers that can be found here.
unfortunately, when I try to parse the link that look like this:
http://www.website.net/index.php?option=com_adsmanager&page=display&catid=87&tid=208196
but my parsers only show the link as: http://www.website.net/index.php?option=, and the rest of the link is getting cut off.
any thoughts on how to fix this?
edit 1:
the SaxParser even doesn't work at all. it claims (incorrectly) that the document is not well formed, but I know its not true since it was checked and doubled checked.
edit 2:
the NodeList
had more than one child and every ampersand (&) created a new node.
therefor, the code I had:
if (name.equalsIgnoreCase(LINK)) {
val = property.getFirstChild().getNodeValue();
message.setLink(val);
}
was not good. and so, I fixed it to this code:
if (name.equalsIgnoreCase(LINK)) {
val = "";
NodeList list = property.getChildNodes();
for (int i = 0; i < list.getLength(); i++) {
val += list.get(i).getNodeValue().toString();
}
message.setLink(val);
}
that was the way to do this in the DOM XML feed parser. now all I have to do is find out a way to do this within a different parser from the IBM examples.
Upvotes: 1
Views: 376
Reputation: 12587
Well. I sort of solved this.
my second update was a correct look at the problem.
the NodeList
had more than one child and every ampersand (&) created a new node.
therefor, the code I had:
if (name.equalsIgnoreCase(LINK)) {
val = property.getFirstChild().getNodeValue();
message.setLink(val);
}
was not good. and so, I fixed it to this code:
if (name.equalsIgnoreCase(LINK)) {
val = "";
NodeList list = property.getChildNodes();
for (int i = 0; i < list.getLength(); i++) {
val += list.get(i).getNodeValue().toString();
}
message.setLink(val);
}
that was the way to do this in the DOM XML feed parser
Upvotes: 0
Reputation: 42870
<link>http://www.website.net/index.php?option=com_adsmanager&page=display&catid=87&tid=208196</link>
...is not valid XML, since the &
s are not followed by valid xml entities.
There are a couple of ways to work around this:
Escape the &
s:
<link>http://www.website.net/index.php?option=com_adsmanager&page=display&catid=87&tid=208196</link>
Wrap the link
section in CDATA
<link><![CDATA[http://www.website.net/index.php?option=com_adsmanager&page=display&catid=87&tid=208196]]></link>
If you are not in control of the RSS file creation, you will have to pre-process the document before feeding it to an XML parser. Move forgiving xml parsers like TagSoup might be helpful.
Upvotes: 1