Reputation: 1
I have a question. When I try to get images from a web page by using Jsoup in Java. Here is the code:
String link = "http://truyentranhtuan.com/detective-conan/856/doc-truyen/";
Document docs = Jsoup.connect(link).timeout(60000).get();
Elements comics = docs.select("#hienthitruyen img");
System.out.println(comics.size());
for (Element comic : comics) {
int i = 0;
System.out.println(comic);
String linkImage = comic.attr("src");
if (!"".equals(linkImage)) {
URL url = new URL(linkImage);
BufferedImage image = ImageIO.read(url);
ImageIO.write(image, "jpg", new File(i + ".jpg"));
i++;
}
}
The problem is I can't get any img tag in this web page. The size of Elements always be zero. But when I view source in this web page the img tag always be there.
Upvotes: 0
Views: 152
Reputation: 26132
If you look at the real source, not the DOM structure (for example, save the HTML page and open it in Notepad), you will see that there are no img tags there. They are all populated dynamically by the means of Javascript.
Now the problem is that Jsoup is not meant to execute Javascript, therefore you can only parse the original DOM structure, before it is modified (filled with images) by Javascript. To do what you want, you can use HTMLUnit which can execute most of the Javascript.
Upvotes: 1