Reputation: 1226
I am making a practice application with the goal of reading data from an RSS feed.
So far it has gone well, except my application encounters an issue with special characters. It reads the first special character within the node, and then moves to the next node.
Any help would be much appreciated, and sorry for the large code blocks that follow.
RSS Feed - www.usu.co.nz/usu-news/rss.xml
<title>Unitec hosts American film students</title>
<link>http://www.usu.co.nz/node/4640</link>
<description><p>If you’ve been hearing American accents around the Mt Albert campus over the past week.</description>
Display Code
String xml = XMLFunctions.getXML();
Document doc = XMLFunctions.XMLfromString(xml);
NodeList nodes = doc.getElementsByTagName("item");
for (int i = 0; i < nodes.getLength(); i++)
{
Element e = (Element)nodes.item(i);
Log.v("XMLTest", XMLFunctions.getValue(e, "title"));
Log.v("XMLTest", XMLFunctions.getValue(e, "link"));
Log.v("XMLTest", XMLFunctions.getValue(e, "description"));
Log.v("XMLTest", XMLFunctions.getValue(e, "pubDate"));
Log.v("XMLTest", XMLFunctions.getValue(e, "dc:creator"));
}
Reader Code
public class XMLFunctions
{
public final static Document XMLfromString(String xml)
{
Document doc = null;
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
try {
DocumentBuilder db = dbf.newDocumentBuilder();
InputSource is = new InputSource();
is.setCharacterStream(new StringReader(xml));
doc = db.parse(is);
} catch (ParserConfigurationException e) {
System.out.println("XML parse error: " + e.getMessage());
return null;
} catch (SAXException e) {
System.out.println("Wrong XML file structure: " + e.getMessage());
return null;
} catch (IOException e) {
System.out.println("I/O exeption: " + e.getMessage());
return null;
}
return doc;
}
/** Returns element value
* @param elem element (it is XML tag)
* @return Element value otherwise empty String
*/
public final static String getElementValue( Node elem ) {
Node kid;
if(elem != null)
{
if (elem.hasChildNodes())
{
for(kid = elem.getFirstChild(); kid != null; kid = kid.getNextSibling())
{
if( kid.getNodeType() == Node.TEXT_NODE )
{
return kid.getNodeValue();
}
}
}
}
return "";
}
public static String getXML(){
String line = null;
try {
DefaultHttpClient httpClient = new DefaultHttpClient();
HttpPost httpPost = new HttpPost("http://www.usu.co.nz/usu-news/rss.xml");
HttpResponse httpResponse = httpClient.execute(httpPost);
HttpEntity httpEntity = httpResponse.getEntity();
line = EntityUtils.toString(httpEntity);
} catch (UnsupportedEncodingException e) {
line = "<results status=\"error\"><msg>Can't connect to server</msg></results>";
} catch (MalformedURLException e) {
line = "<results status=\"error\"><msg>Can't connect to server</msg></results>";
} catch (IOException e) {
line = "<results status=\"error\"><msg>Can't connect to server</msg></results>";
}
return line;
}
public static int numResults(Document doc){
Node results = doc.getDocumentElement();
int res = -1;
try{
res = Integer.valueOf(results.getAttributes().getNamedItem("count").getNodeValue());
}catch(Exception e ){
res = -1;
}
return res;
}
public static String getValue(Element item, String str) {
NodeList n = item.getElementsByTagName(str);
return XMLFunctions.getElementValue(n.item(0));
}
}
Output
Unitec hosts American film students
http://www.usu.co.nz/node/4640
<
Wed, 01 Aug 2012 05:43:22 +0000
Phillipa
Upvotes: 2
Views: 1730
Reputation: 41137
Your function
public final static String getElementValue( Node elem ) {
Node kid;
if(elem != null)
{
if (elem.hasChildNodes())
{
for(kid = elem.getFirstChild(); kid != null; kid = kid.getNextSibling())
{
if( kid.getNodeType() == Node.TEXT_NODE )
{
return kid.getNodeValue();
}
}
}
}
return "";
}
is returning the first text node under the given element. A chunk of text within a single tag can be split into multiple text nodes, and this tends to happen in the presence of special characters.
You should probably append all the text nodes into a string for the return value.
Something approximately like this might work:
public final static String getElementValue( Node elem ) {
if ((elem == null) || (!(elem.hasChildNodes())))
return "";
Node kid;
StringBuilder builder = new StringBuilder();
for(kid = elem.getFirstChild(); kid != null; kid = kid.getNextSibling())
{
if( kid.getNodeType() == Node.TEXT_NODE )
{
builder.append(kid.getNodeValue());
}
}
return builder.toString();
}
Upvotes: 2
Reputation: 27614
Slightly off-topic, but you might want to check out one of the already existing RSS frameworks, like ROME. Better than re-inventing the wheel.
Upvotes: 1
Reputation: 122394
Your code only extracts the first child text node from the element. The DOM spec allows multiple adjacent text nodes, so I suspect what's happening here is that your parser is representing the <
, p
, >
and the remaining text as (at least) four separate text nodes. You will either need to concatenate the nodes together into one string, or call normalize()
on the containing element node (which modifies the DOM tree to merge adjacent text nodes into one).
There are various libraries that can help you. For example, if your application uses the Spring framework then org.springframework.util.xml.DomUtils
has a getTextValue
static method that will extract the complete text value from an element.
Upvotes: 3
Reputation: 33
Are you sure the XML string is not converted by the DefaultHttpClient? I tried your code and changed the method XMLFunctions.getXML() to feed the XML string directly instead of getting it by the DefaultHttpClient, the output is like
Unitec hosts American film students
http://www.usu.co.nz/node/4640
<p>If you’ve been hearing American accents around the Mt Albert campus over the past week.
as expected.
Upvotes: 0
Reputation: 5654
<?xml version="1.0" encoding="UTF-8"?>
seems to be missing. Also there is no root-element.
Upvotes: 0