Reputation: 1144
I am attempting to parse HTML for specific data but am having issues with return characters, at least I think that's what the problem is. I am using a simple substring method to take apart the HTML as I know beforehand what I am looking for.
Here is my parse method:
public static void parse(String response, String[] hashItem, String[][] startEnd) throws Exception
{
for (i = 0; i < hashItem.length; i++)
{
part = response.substring(response.indexOf(startEnd[i][0]) + startEnd[i][0].length());
value = part.substring(0, part.indexOf(startEnd[i][1]));
DATABASE.setHash(hashItem[i], value);
}
}
Here is a sample of the HTML that is giving me issues
<table cellspacing=0 cellpadding=2 class=smallfont>
<tr onclick="lu();" onmouseover="style.cursor='hand'">
<td class=bodybox nowrap> 21,773,177,147 $ </td><td></td>
<td class=bodybox nowrap> 629,991,926 F </td><td></td>
<td class=bodybox nowrap> 24,537 P </td><td></td>
<td class=bodybox nowrap> 0 T </td>
<td></td><td class=bodybox nowrap> RT </td>
There are hidden return characters but when I try to add them into the string that I am trying to use it doesn't work out well, if at all. Is there a method or perhaps a better way to strip hidden characters from the HTML to make it easier to parse? Any help is greatly appreciated as always.
Upvotes: 3
Views: 7172
Reputation: 22930
You can try with XMLPullParser
available in Android. You can use StringBuffer
to append characters in between tags.
Upvotes: 1
Reputation: 1
You can parse the HTML file using a XMLReader for example as far as i know, check this article http://www.ibm.com/developerworks/xml/library/x-andbene1/
Upvotes: 0
Reputation: 7128
If you want to make parsing very easy, try Jsoup:
This example will download the page, parse and get the text.
Document doc = Jsoup.connect("http://jsoup.org").get();
Elements tds = doc.select("td.bodybox");
for (Element td : tds) {
String tdText = td.text();
}
Upvotes: 8
Reputation: 6376
Try using a regex to gain the information you want: http://java.sun.com/developer/technicalArticles/releases/1.4regex/
You could even use it to remove the hidden characters. Or maybe use String.Replace
to remove the newline characters?
Upvotes: 0