Akaitenshi
Akaitenshi

Reputation: 373

How to read an HTML table with Jsoup

I am trying to read the table with the cities from here

Essential I want all the cities names but I am stuck at the part where i traverse to the inside of the table.

Select code.

 Element table = rawCities.getElementById("content")
                 .getElementById("bodyContent")
                 .getElementById("mw-content-text")
                 .select("table.wikitable sortable jquery-tablesorter").first()
                 `.select("tbody").first()`;

So the document is downloaded and parsed with Jsoup.connect in another class and here I am trying to get the city names. When I traverse with selects I get a NullPointerException here. If I get rid of the .select("tbody").first() the program runs but debugger shows table variable null. Should I be doing this in an other way or did I get something wrong?

Upvotes: 0

Views: 339

Answers (1)

Pshemo
Pshemo

Reputation: 124215

If you print rawCities you will most probably not find any element which would represent tag <jquery-tablesorter>. So you should remove it from your select.

Another problem is that table.wikitable sortable will try to find

<table class="wikitable">
  ...
    <sortable>
  ...
</table>

not

<table class"wikitable sortable">...

To find element with few classes use . operator before each class name like element.class1.class2 not space (which describes ancestor-child relationship) element.class1 class2.

So your code could be simplified to

Element table = rawCities
        .select("table.wikitable.sortable tbody")
        .first();

Anyway if you only want to print content of first column of selected table you can do it with

for (Element row : rawCities.select("table.wikitable.sortable td:eq(0) a")) {
    System.out.println(row.text());
}

You can use this loop to also add results of row.text() to some List<String> created earlier or use code like

List<String> names = rawCities
        .select("table.wikitable.sortable td:eq(0) a")
        .stream()
        .map(e -> e.text())
        .collect(Collectors.toList());

Upvotes: 2

Related Questions