Reputation: 1757
i call a web service that returns some HTML which enclosed in an XML envelop... something like:
<xml version="1.0" cache="false">
<text color="white">
<p> Some text <br /> <p>
</text>
</xml>
I use XmlPullParser to parse this XML/HTML. To get the text in element, i do the following:
case XmlPullParser.START_TAG:
xmlNodeName = parser.getName();
if (xmlNodeName.equalsIgnoreCase("text")) {
String color = parser.getAttributeValue(null, "color");
String text = parser.nextText();
if (color.equalsIgnoreCase("white")) {
detail.setDetail(Html.fromHtml(text).toString());
}
}
break;
This works well and gets the text or html in element even if it contains some html tags.
Issue arises when the element's data starts with <p> tag as in above example. in this case the data is lost and text is empty.
How can i resolve this?
EDIT
Thanks to Nik & rajesh for pointing out that my service's response is actually not a valid XML & element not closed properly. But i have no control over the service so i cannot edit whats returned. I wonder if there is something like HTML Agility that can parse any type of malformed HTML or can at least get whats in html tags .. like inside <text> ... </text> in my case?? That would also be good.
OR anything else that i can use to parse what i get from the service will be good as long as its decently implementable.
Excuse me for my bad english
Upvotes: 2
Views: 2604
Reputation: 1757
Isnpired by Martin's approach of converting the received data first to string, i managed my problem in a kind of mixed approach.
Convert the received InputStream's value to string and replaced the erroneous tag with "" (or whatever you wish) : as follows
InputStreamReader isr = new InputStreamReader(serviceReturnedStream);
BufferedReader br = new BufferedReader(isr);
StringBuilder xmlAsString = new StringBuilder(512);
String line;
try {
while ((line = br.readLine()) != null) {
xmlAsString.append(line.replace("<p>", "").replace("</p>", ""));
}
} catch (IOException e) {
e.printStackTrace();
}
Now i have a string which contains correct XML data (for my case), so just use the normal XmlPullParser to parse it instead of manually parsing it myself:
XmlPullParserFactory factory = XmlPullParserFactory.newInstance();
factory.setNamespaceAware(false);
XmlPullParser parser = factory.newPullParser();
parser.setInput(new StringReader(xmlAsString.toString()));
Hope this helps someone!
Upvotes: 1
Reputation: 15774
You are seeing that behavior because what you have inside the <text>...</text>
tags is not a text element, but an XML Node element. You should enclose the contents in a CDATA section.
Edit: Providing the code segment for my suggestion in the comment. It does indeed work with the sample XML given by you.
StringBuffer html = new StringBuffer();
int eventType = parser.getEventType();
while (eventType != XmlPullParser.END_DOCUMENT) {
if(eventType == XmlPullParser.START_TAG) {
String name = parser.getName();
if(name.equalsIgnoreCase("text")){
isText = true;
}else if(isText){
html.append("<");
html.append(name);
html.append(">");
}
} else if(eventType == XmlPullParser.END_TAG) {
String name = parser.getName();
if(name.equalsIgnoreCase("text")){
isText = false;
}else if(isText){
html.append("</");
html.append(name);
html.append(">");
}
} else if(eventType == XmlPullParser.TEXT) {
if(isText){
html.append(parser.getText());
}
}
eventType = parser.next();
}
Upvotes: 3
Reputation: 16194
Because above code you don't close "</p>"
TAG.
<p> Some text <br /> </p>
Used this line .
Upvotes: 2