Constantin N.
Constantin N.

Reputation: 2839

How te retrieve link from JavaScript using Jsoup

I'm building a scraper app and i need to get image from the website. The problem is that when i fetch the page i'm getting default image(image that appears before the page ends up loading). The real image is displayed instead of it (same HTML path but different URL source )by Javascrip at the end of loading.

this is my snippet

public class Scapper{


public static void main(String[] args) throws IOException {


    Document doc= Jsoup.connect(Url).get();


for (Element img : doc.getElementsByClass("prdct-dtl__thmbnl-wrpr").select(".prdct-dtl__thmbnl").select("img[src]")){
        String url = img.absUrl("src");
        System.out.println("Founded"+url);
    }}}

Update [This is the link to the website][1]

[1]:

I need only the phone's images links

Upvotes: 1

Views: 501

Answers (1)

Tom C
Tom C

Reputation: 814

Thanks for the question edit by the way. So, after looking over your question and the site you want to parse you are correct in saying that the images are loaded via Javascript and that Jsoup returns the placeholder images.

This is because the image(s) you actually want are loaded later, and added to the DOM by Javascript, as Jsoup is a HTML parser it is unable to work in the same way as a browser does and not know about the newly loaded image(s).

There are ways around this and you can retrieve the image(s) after they are added to the DOM but it requires other third party libraries such as phantomjs or Ghost Driver and requires waiting for the DOM to have finished loading all its assets from Javascript and JQuery.

Here is a similar question that may help.

Upvotes: 1

Related Questions