gishara
gishara

Reputation: 845

JSoup parsing HTML table in div

I am trying to crawl the following website:

http://services2.hdb.gov.sg/webapp/BB33RTIS/BB33SSearchWidget

I am connecting to the site and parse html table as below:

Document doc = Jsoup
                           .connect("http://services2.hdb.gov.sg/webapp/BB33RTIS/BB33SSearchWidget")
                           .data("FLAT_TYPE", "02")
                           .data("NME_NEWTOWN", "BD      Bedok")
                           .data("NME_STREET", "")
                           .data("NUM_BLK_FROM", "")
                           .data("NUM_BLK_TO", "")
                           .data("dteRange", "12")
                           .data("DTE_APPROVAL_FROM", "May 2015")
                           .data("DTE_APPROVAL_TO", "May 2016")
                           .data("AMT_RESALE_PRICE_FROM", "")
                           .data("AMT_RESALE_PRICE_TO", "")
                           .data("Process", "continue")
                           .cookies(cookies)
                           .timeout(0)
                           .post();

            Element table =     doc.getElementsByTag("table").first();

I tried the below way also, but the table was still null:

Element tableBody = doc.select("div[class=content]").select("table").first();

However the table is always empty.Please someone tell me where I am doing wrong. Thanks in advance.

Upvotes: 1

Views: 260

Answers (2)

Sestertius
Sestertius

Reputation: 1372

You must add another parameter to your request: enter image description here

Working code:

    try {

        String url = "https://services2.hdb.gov.sg/webapp/BB33RTIS/BB33SSearchWidget";

        Connection.Response response = Jsoup
                .connect(url)
                .userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko)" +
                        " Chrome/33.0.1760.152 Safari/537.36")
                .ignoreHttpErrors(true)
                .method(Connection.Method.GET)
                .execute();

        Document responseDocument = Jsoup.parse(response.body());

        Element rtisEnqFlagID = responseDocument.select("div.row input[type=hidden]").last();
        String name = rtisEnqFlagID.attr("name");
        String value = rtisEnqFlagID.attr("value");

        Document document = Jsoup.connect(url)
                .userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko)" +
                        " Chrome/33.0.1750.152 Safari/537.36")
                .data("FLAT_TYPE", "02")
                .data("NME_NEWTOWN", "BD      Bedok")
                .data("NME_STREET", "")
                .data("NUM_BLK_FROM", "")
                .data("NUM_BLK_TO", "")
                .data("dteRange", "12")
                .data("DTE_APPROVAL_FROM", "May 2015")
                .data("DTE_APPROVAL_TO", "May 2016")
                .data("AMT_RESALE_PRICE_FROM", "")
                .data("AMT_RESALE_PRICE_TO", "")
                .data("Process", "continue")
                .data(name, value)
                .cookies(response.cookies())
                .post();

        Elements tableBody = document.select("div.content table");

        for (Element table : tableBody)
            System.out.println(table);

    } catch (IOException e) {
        e.printStackTrace();
    }

Output:

<table style="margin-bottom: .5em; width: 100%;"> 
 <tbody>
  <tr> 
   <th width="46%" style="text-align: left;"><span>Block</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>514</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Storey</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>07 to 09</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Floor Area (sqm)/Flat Model</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>45.00 <br>Improved</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Lease Commence Date</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>1979</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Resale Price</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>$240,000.00</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Resale Registration Date</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>Jun 2015</span></td> 
  </tr> 
 </tbody>
</table>
<table style="margin-bottom: .5em; width: 100%;"> 
 <tbody>
  <tr> 
   <th width="46%" style="text-align: left;"><span>Block</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>101</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Storey</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>07 to 09</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Floor Area (sqm)/Flat Model</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>45.00 <br>Improved</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Lease Commence Date</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>1978</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Resale Price</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>$240,000.00</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Resale Registration Date</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>Nov 2015</span></td> 
  </tr> 
 </tbody>
</table>
<table style="margin-bottom: .5em; width: 100%;"> 
 <tbody>
  <tr> 
   <th width="46%" style="text-align: left;"><span>Block</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>113</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Storey</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>10 to 12</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Floor Area (sqm)/Flat Model</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>44.00 <br>Improved</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Lease Commence Date</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>1978</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Resale Price</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>$244,000.00</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Resale Registration Date</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>Mar 2016</span></td> 
  </tr> 
 </tbody>
</table>
<table style="margin-bottom: .5em; width: 100%;"> 
 <tbody>
  <tr> 
   <th width="46%" style="text-align: left;"><span>Block</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>535</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Storey</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>01 to 03</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Floor Area (sqm)/Flat Model</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>45.00 <br>Improved</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Lease Commence Date</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>1986</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Resale Price</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>$250,000.00</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Resale Registration Date</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>Jan 2016</span></td> 
  </tr> 
 </tbody>
</table>
<table style="margin-bottom: .5em; width: 100%;"> 
 <tbody>
  <tr> 
   <th width="46%" style="text-align: left;"><span>Block</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>534</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Storey</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>04 to 06</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Floor Area (sqm)/Flat Model</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>45.00 <br>Improved</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Lease Commence Date</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>1986</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Resale Price</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>$248,000.00</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Resale Registration Date</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>Nov 2015</span></td> 
  </tr> 
 </tbody>
</table>
<table style="margin-bottom: .5em; width: 100%;"> 
 <tbody>
  <tr> 
   <th width="46%" style="text-align: left;"><span>Block</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>535</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Storey</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>10 to 12</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Floor Area (sqm)/Flat Model</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>45.00 <br>Improved</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Lease Commence Date</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>1986</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Resale Price</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>$230,000.00</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Resale Registration Date</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>Nov 2015</span></td> 
  </tr> 
 </tbody>
</table>
<table style="margin-bottom: .5em; width: 100%;"> 
 <tbody>
  <tr> 
   <th width="46%" style="text-align: left;"><span>Block</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>535</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Storey</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>04 to 06</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Floor Area (sqm)/Flat Model</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>45.00 <br>Improved</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Lease Commence Date</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>1986</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Resale Price</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>$246,500.00</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Resale Registration Date</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>Oct 2015</span></td> 
  </tr> 
 </tbody>
</table>
<table style="margin-bottom: .5em; width: 100%;"> 
 <tbody>
  <tr> 
   <th width="46%" style="text-align: left;"><span>Block</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>541</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Storey</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>10 to 12</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Floor Area (sqm)/Flat Model</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>45.00 <br>Improved</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Lease Commence Date</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>1985</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Resale Price</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>$238,000.00</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Resale Registration Date</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>Jul 2015</span></td> 
  </tr> 
 </tbody>
</table>
<table style="margin-bottom: .5em; width: 100%;"> 
 <tbody>
  <tr> 
   <th width="46%" style="text-align: left;"><span>Block</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>620</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Storey</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>07 to 09</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Floor Area (sqm)/Flat Model</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>45.00 <br>Improved</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Lease Commence Date</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>1986</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Resale Price</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>$250,000.00</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Resale Registration Date</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>Mar 2016</span></td> 
  </tr> 
 </tbody>
</table>
<table style="margin-bottom: .5em; width: 100%;"> 
 <tbody>
  <tr> 
   <th width="46%" style="text-align: left;"><span>Block</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>618</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Storey</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>04 to 06</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Floor Area (sqm)/Flat Model</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>45.00 <br>Improved</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Lease Commence Date</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>1986</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Resale Price</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>$250,000.00</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Resale Registration Date</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>Feb 2016</span></td> 
  </tr> 
 </tbody>
</table>
<table style="margin-bottom: .5em; width: 100%;"> 
 <tbody>
  <tr> 
   <th width="46%" style="text-align: left;"><span>Block</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>620</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Storey</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>01 to 03</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Floor Area (sqm)/Flat Model</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>45.00 <br>Improved</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Lease Commence Date</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>1986</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Resale Price</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>$245,000.00</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Resale Registration Date</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>May 2015</span></td> 
  </tr> 
 </tbody>
</table>
<table style="margin-bottom: .5em; width: 100%;"> 
 <tbody>
  <tr> 
   <th width="46%" style="text-align: left;"><span>Block</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>38</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Storey</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>07 to 09</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Floor Area (sqm)/Flat Model</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>44.00 <br>Improved</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Lease Commence Date</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>1978</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Resale Price</span> </th> 
   <td width="54%" style="vertical-align: middle;"><span>$253,000.00</span></td> 
  </tr> 
  <tr> 
   <th width="46%" align="left" style="text-align: left;"><span>Resale Registration Date</span></th> 
   <td width="54%" style="vertical-align: middle;"><span>May 2015</span></td> 
  </tr> 
 </tbody>
</table>

Upvotes: 3

TDG
TDG

Reputation: 6151

The site is using now HTTPS protocol. Change your URL to
String url = "https://services2.hdb.gov.sg/webapp/BB33RTIS/BB33SSearchWidget"; (https instead of http) and it will work.

Upvotes: 2

Related Questions