user2250466
user2250466

Reputation: 89

JSoup - Parse simple HTML

I'm trying to parse this king of HTML for my android app :

<table>
    <p> blablabla </p>
    <p> bliblibli </p>
</table>
<p> Hello </p>
<p> Hello2 </p>
....
<p> Hellon </p>
<table>
    <p> blablabla </p>
    <p> bliblibli </p>
</table>

It's easy to get the inside of table tag with getElementsByTag("table") and then getElementsByTag("p").

But what about the "hello" section where i don't know how much lines there is ??

My first idea was croping the string at first with string.split("table") but it's king of awfull.

Thanks for the help.

Upvotes: 2

Views: 137

Answers (2)

MariuszS
MariuszS

Reputation: 31595

This is hard with your invalid example, after JSoup parsing this documents looks like this:

<html>
 <head></head>
 <body>
  <p> blablabla </p>
  <p> bliblibli </p>
  <table>   
  </table> 
  <p> Hello </p> 
  <p> Hello2 </p> .... 
  <p> Hellon </p> 
  <p> blablabla </p>
  <p> bliblibli </p>
  <table>   
  </table>
 </body>
</html>

All paragraphs are on first level and tables are empty.

Proper table example

<table>
  <tr>
    <td>
       <p> blablabla </p>
    </td>
    <td>
       <p> bliblibli </p>
    </td>
  </tr>
</table>

After fixing sample HTML things are much easier

public static void main(String[] args) {

    String html = "<table><tr><td>\n" +
            "    <p> blablabla </p>\n" +
            "    <p> bliblibli </p>\n" +
            "</td></tr></table>\n" +
            "<p> Hello </p>\n" +
            "<p> Hello2 </p>\n" +
            "....\n" +
            "<p> Hellon </p>\n" +
            "<table><tr><td>\n" +
            "    <p> blablabla </p>\n" +
            "    <p> bliblibli </p>\n" +
            "</td></tr></table>";

    Elements p1 = Jsoup.parse(html).select("body > p");

    System.out.println(p1.html());

}

And result

Hello
Hello2
Hellon 

Just use combinator >, works like a charm :)

parent > child: child elements that descend directly from parent, e.g. div.content > p finds p elements; and body > * finds the direct children of the body tag

Upvotes: 2

Shabbir Dhangot
Shabbir Dhangot

Reputation: 9141

This Will surely help u

          Element content = doc.select("p").get(0);
          content.tagName();
          str=content.text();
          Log.d("Check", str + content.tagName());

In this u can loop get() method..

Upvotes: 0

Related Questions