Sudeep
Sudeep

Reputation: 37

Jsoup not parsing data from table

I'm trying to scrape this webpage http://www.skysports.com/football/competitions/la-liga/table.I just want the name of teams from the table. I'm using Jsoup for this purpose. Here's my code

private class LoadData extends AsyncTask<Void,Void,Void> {
    String url = "http://www.skysports.com/football/competitions/la-liga/table";
    String data = "";

    @Override
    protected Void doInBackground(Void... params) {
        Document document;
        try {
            document = Jsoup.connect(url).timeout(0).get();
            Elements clubName = document.select("td.standing-table__cell standing-table__cell--name");
            int a = clubName.size();
            for(int i = 0; i < a; i++) {
                data += "\n\n" +clubName.get(i).text();

            }
        } catch (IOException e) {
            e.printStackTrace();
        }
        return null;
    }

    @Override
    protected void onPostExecute(Void result) {
        teamview = (TextView) findViewById(R.id.club_view);
        teamview.setMovementMethod(new ScrollingMovementMethod());
        teamview.setText(data);
        super.onPostExecute(result);
    }
}

and here's its html code

    <tr class="standing-table__row" data-item-id="872">
  <td class="standing-table__cell">1</td>
  <td class="standing-table__cell standing-table__cell--name" data-short-name="Atletico Madrid" data-long-name="Atletico Madrid">

            <a href="/football/teams/atletico-madrid" class="standing-table__cell--name-link">Atletico Madrid</a>

  </td>
  <td class="standing-table__cell">19</td>
  <td class="standing-table__cell is-hidden--bp35">14</td>
  <td class="standing-table__cell is-hidden--bp35">2</td>
  <td class="standing-table__cell is-hidden--bp35">3</td>
  <td class="standing-table__cell is-hidden--bp35">27</td>
  <td class="standing-table__cell is-hidden--bp35">8</td>
  <td class="standing-table__cell">19</td>
  <td class="standing-table__cell" data-sort-value="1">44</td>
  <td class="standing-table__cell is-hidden--bp15 is-hidden--bp35 " data-sort-value="15333033">
          <div class="standing-table__form">
      <span title="Granada 0-2 Atletico Madrid" class="standing-table__form-cell standing-table__form-cell--win"> </span><span title="Atletico Madrid 2-1 Athletic Bilbao" class="standing-table__form-cell standing-table__form-cell--win"> </span><span title="Malaga 1-0 Atletico Madrid" class="standing-table__form-cell standing-table__form-cell--loss"> </span><span title="Rayo Vallecano 0-2 Atletico Madrid" class="standing-table__form-cell standing-table__form-cell--win"> </span><span title="Atletico Madrid 1-0 Levante" class="standing-table__form-cell standing-table__form-cell--win"> </span><span title="Celta Vigo 0-2 Atletico Madrid" class="standing-table__form-cell standing-table__form-cell--win"> </span>        </div>
        </td>

</tr>

When i use the code document.select("td.standing-table__cell");, the data is shown. But when i use document.select("td.standing-table__cell standing-table__cell--name"); instead of document.select("td.standing-table__cell");, no data is shown!?

Upvotes: 0

Views: 396

Answers (2)

Gareth1305
Gareth1305

Reputation: 106

The code below loops through each row of the table. It then prints out, based on the css class name, the name of the club which is in the row of the table that the for loop is on.

    String url = "http://www.skysports.com/football/competitions/la-liga/table";
    try {
        Document document = Jsoup.connect(url).timeout(0).get();
        Elements clubRow = document.select("tr.standing-table__row");
        for(Element club: clubRow) {
            System.out.println(club.select("a.standing-table__cell--name-link").text());
        }
    } catch (IOException e) {
        e.printStackTrace();
    }

Upvotes: 0

luksch
luksch

Reputation: 11712

The selector document.select("td.standing-table__cell standing-table__cell--name"); will select all elements that have a tag name standing-table__cell--name and that are (indirect) children of td elements with a class called standing-table__cell. None such elements exist and so Jsoup returns an empty list.

What you probably want is to select td elements with both classes standing-table__cell and standing-table__cell--name. This can be done with CSS selectors like this:

 document.select("td.standing-table__cell.standing-table__cell--name");

Note: The dot followed by a class name is the CSS selector for a class. They can be concatenated.

Upvotes: 1

Related Questions