Reputation: 89
I'm trying to parse this king of HTML for my android app :
<table>
<p> blablabla </p>
<p> bliblibli </p>
</table>
<p> Hello </p>
<p> Hello2 </p>
....
<p> Hellon </p>
<table>
<p> blablabla </p>
<p> bliblibli </p>
</table>
It's easy to get the inside of table tag with getElementsByTag("table") and then getElementsByTag("p").
But what about the "hello" section where i don't know how much lines there is ??
My first idea was croping the string at first with string.split("table") but it's king of awfull.
Thanks for the help.
Upvotes: 2
Views: 137
Reputation: 31595
This is hard with your invalid example, after JSoup parsing this documents looks like this:
<html>
<head></head>
<body>
<p> blablabla </p>
<p> bliblibli </p>
<table>
</table>
<p> Hello </p>
<p> Hello2 </p> ....
<p> Hellon </p>
<p> blablabla </p>
<p> bliblibli </p>
<table>
</table>
</body>
</html>
All paragraphs are on first level and tables are empty.
Proper table example
<table>
<tr>
<td>
<p> blablabla </p>
</td>
<td>
<p> bliblibli </p>
</td>
</tr>
</table>
After fixing sample HTML things are much easier
public static void main(String[] args) {
String html = "<table><tr><td>\n" +
" <p> blablabla </p>\n" +
" <p> bliblibli </p>\n" +
"</td></tr></table>\n" +
"<p> Hello </p>\n" +
"<p> Hello2 </p>\n" +
"....\n" +
"<p> Hellon </p>\n" +
"<table><tr><td>\n" +
" <p> blablabla </p>\n" +
" <p> bliblibli </p>\n" +
"</td></tr></table>";
Elements p1 = Jsoup.parse(html).select("body > p");
System.out.println(p1.html());
}
And result
Hello
Hello2
Hellon
Just use combinator >
, works like a charm :)
parent > child: child elements that descend directly from parent, e.g. div.content > p finds p elements; and body > * finds the direct children of the body tag
Upvotes: 2
Reputation: 9141
This Will surely help u
Element content = doc.select("p").get(0);
content.tagName();
str=content.text();
Log.d("Check", str + content.tagName());
In this u can loop get() method..
Upvotes: 0