Mounir Elfassi
Mounir Elfassi

Reputation: 2252

jsoup for unstructured html page with table

i'm trying to get the main img from this url, here what i tried so far :

Document doc = null;
    try {
        doc = Jsoup.connect(url).get();
    } catch (IOException e) {
        e.printStackTrace();
    }

    Element table = doc.select("center").get(1);
    Elements rows = table.select("table[width=970]");
    for (int i = 0; i < rows.size(); i++) {
        Element row = rows.get(1);
        Elements cols = row.select("table[width=634]");
        for (int j = 0; j < cols.size(); j++) {
            Element row1 = rows.get(1);
            Elements cols1 = row1.select("table[width=600]");
            for (int k = 0; k < cols1.size(); k++){
                Element row0 = rows.first();
                Elements cols0 = row0.select("td");
                for (Element image : cols0) {
                    String image2 = image.absUrl("src").toString();
                    Log.i("tanja7 ", "pic  " + image2);
                }
            }
        }
    }

this is the unstructured html page (i don't know how to copy the html code) enter image description here What i'm doing wrong?

Upvotes: 0

Views: 85

Answers (1)

luksch
luksch

Reputation: 11712

It seems that you are expecting the inner elements as result of a JSoup select method call. That is not right - you get the elements that match the selector within the "search scope", which is given by the Element(s)/document class instance from which you call select. So, if you want to get all table elements of the document you do doc.select("table"). This gives you not the rows, but the tables. Maybe you did understand this before, but your variable naming suggests otherwise.

Anyway, here is a selector that works. It will get all img elements that are (not necessarily direct) children of a table that has the attribute width=600 and is within another table of the document.

Elements imgEls = doc.select("table table[width=600] img");
System.out.println(imgEls.first().absUrl("src"));

You say the html is not structured, so you might want to check if the relevant images are really always inside two tables as specified.

update: if you are using a mobile device make sure to add:

doc = Jsoup.connect(url).userAgent("Mozilla").get();

Upvotes: 1

Related Questions