user3000019
user3000019

Reputation: 103

Trouble Getting information from html tables in java

I want to get information from the first table inside this site Link

This its the code i have

Document document = Jsoup.parse(DownloadPage("http://www.transtejo.pt/clientes/horarios" +
            "-ligacoes-fluviais/ligacao-barreiro-terreiro-do-paco/#dias-uteis"));

    Elements table = document.select("table.easy-table-creator:nth-child(1) tbody");
    Elements trAll = table.select("tr");

    //For the Table Hour
    Elements tr_first = table.select("tr:nth-child(1)");
    Element tr = tr_first.get(1);
    Elements td = tr.getElementsByTag("td");
    for(int i = 0; i < td.size(); i++) {
        Log.d("TIME TABLE:"," " + td.get(i).text());

        for(int i1 = 1; i1 < trAll.size(); i1++) {

            Elements td_inside = trAll.get(i1).getElementsByTag("td");
            Log.d("TD INSIDE:"," " + td_inside.get(i).text());


        }



    }

Right now im being able to get information, the problem its that im getting content from other tables, because all tables class name are the same and im having trouble specifying the table that i need, and im also getting IndexOutOfBoundsException

This its the Log of this Loglink

The type of log i want its something like this: The Hour(TIME TABLE) and then in this hour i want to get all the bottom lines with the minutes (TD INSIDE) for that hour, and then move to next hour (...)

Thans for your time.

[EDIT] Better log example Check first table.

TIME TABLE: 05H
TD INSIDE: 15
TD INSIDE: 45
TIME TABLE: 06H
TD INSIDE: 15
TD INSIDE: 35
TD INSIDE: 45
TD INSIDE: 55
TIME TABLE: 07H
TD INSIDE: 05
TD INSIDE: 15
TD INSIDE: 20
TD INSIDE: 25
TD INSIDE: 35
TD INSIDE: 40
TD INSIDE: 50
TD INSIDE: 55

(...)

Upvotes: 0

Views: 68

Answers (1)

Davide Pastore
Davide Pastore

Reputation: 8738

You can do it:

Element table = document
  .select("table.easy-table-creator:nth-child(1) tbody").first();
Elements trAll = table.select("tr");
Elements trAllBody = table.select("tr:not(:first-child)");

// For the Table Hour
Element trFirst = trAll.first();
Elements tds = trFirst.select("td");
for(int i = 0; i < tds.size(); i++){
    Element td = tds.get(i);
    Log.d("TIME TABLE:", " " + td.text());

    String query = "td:nth-child(" + (i + 1) + ")";
    Elements subTds = trAllBody.select(query);
    for (int j = 0; j < subTds.size(); j++) {
        Element subTd = subTds.get(j);
        String tdText = subTd.text();
        if(!tdText.isEmpty()){                  
            Log.d("TD INSIDE:", " " + subTd.text());
        }
    }
}

Some interesting points:

  • your table.easy-table-creator:nth-child(1) tbody selector was selecting all the tables in the page;
  • with a progressive select you can retrieve all the tds in a given column: td:nth-child(index);
  • trAllBody here contains all the trs that are not the first one (using the tr:not(:first-child) selector).

Upvotes: 1

Related Questions