Reputation: 2252
i'm trying to get the main img from this url, here what i tried so far :
Document doc = null;
try {
doc = Jsoup.connect(url).get();
} catch (IOException e) {
e.printStackTrace();
}
Element table = doc.select("center").get(1);
Elements rows = table.select("table[width=970]");
for (int i = 0; i < rows.size(); i++) {
Element row = rows.get(1);
Elements cols = row.select("table[width=634]");
for (int j = 0; j < cols.size(); j++) {
Element row1 = rows.get(1);
Elements cols1 = row1.select("table[width=600]");
for (int k = 0; k < cols1.size(); k++){
Element row0 = rows.first();
Elements cols0 = row0.select("td");
for (Element image : cols0) {
String image2 = image.absUrl("src").toString();
Log.i("tanja7 ", "pic " + image2);
}
}
}
}
this is the unstructured html page (i don't know how to copy the html code)
What i'm doing wrong?
Upvotes: 0
Views: 85
Reputation: 11712
It seems that you are expecting the inner elements as result of a JSoup select method call. That is not right - you get the elements that match the selector within the "search scope", which is given by the Element(s)/document class instance from which you call select
. So, if you want to get all table elements of the document you do doc.select("table")
. This gives you not the rows, but the tables. Maybe you did understand this before, but your variable naming suggests otherwise.
Anyway, here is a selector that works. It will get all img elements that are (not necessarily direct) children of a table that has the attribute width=600
and is within another table of the document.
Elements imgEls = doc.select("table table[width=600] img");
System.out.println(imgEls.first().absUrl("src"));
You say the html is not structured, so you might want to check if the relevant images are really always inside two tables as specified.
update: if you are using a mobile device make sure to add:
doc = Jsoup.connect(url).userAgent("Mozilla").get();
Upvotes: 1