Geordy James
Geordy James

Reputation: 2408

How to extract src of an image using Jsoup

I am trying to scrap image src of a product from the online shopping site Flipkart using Jsoup.Here is the code i tried.

String url = "http://www.flipkart.com/moto-g-3rd-generation/p/itme9ysjr7mfry3n?pid=MOBE6KK93JG5WKB2&cmpid=content_mobile_8965229628_gmc_pla&tgi=sem%2C1%2CG%2C11214002%2Cg%2Csearch%2C%2C50314733420%2C1o1%2C%2C%2Cc%2C%2C%2C%2C%2C%2C%2C&gclid=COXtgdLyiMoCFUyhaAodIawO8w";   

Document doc = Jsoup.connect(url).get();

Elements imageElements = doc.select("img[class=productImage]");

String img = imageElements.attr("src");

System.out.println(img);

Here is the HTML code of the from the website . screenshot of html code

Upvotes: 0

Views: 1092

Answers (1)

tagh
tagh

Reputation: 1037

EDIT: This works.

String url = "http://www.flipkart.com/moto-g-3rd-generation/p/itme9ysjr7mfry3n?pid=MOBE6KK93JG5WKB2&cmpid=content_mobile_8965229628_gmc_pla&tgi=sem%2C1%2CG%2C11214002%2Cg%2Csearch%2C%2C50314733420%2C1o1%2C%2C%2Cc%2C%2C%2C%2C%2C%2C%2C&gclid=COXtgdLyiMoCFUyhaAodIawO8w";   

        Document doc;
        try {
            doc = Jsoup.connect(url).get();
            Elements imageElements = doc.select("img.productImage");

            for(Element e : imageElements){
                System.out.println(e.attr("data-src"));
            }

        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();

changes:

  1. you used [class= which can be shortened using .productImage instead

  2. you didn't put a try catch (relatively not important)

  3. you try to get the attr for all elements. You should use a loop instead.

EDIT: OK. I found out the problem about the weird src of the image. It seems like that the Javascript modified the "src" attribute after the page was rendered. The real src is the "data-src" value which is later deleted by the javascript. Weird, huh?

Upvotes: 1

Related Questions