Reputation: 2408
I am trying to scrap image src of a product from the online shopping site Flipkart using Jsoup.Here is the code i tried.
String url = "http://www.flipkart.com/moto-g-3rd-generation/p/itme9ysjr7mfry3n?pid=MOBE6KK93JG5WKB2&cmpid=content_mobile_8965229628_gmc_pla&tgi=sem%2C1%2CG%2C11214002%2Cg%2Csearch%2C%2C50314733420%2C1o1%2C%2C%2Cc%2C%2C%2C%2C%2C%2C%2C&gclid=COXtgdLyiMoCFUyhaAodIawO8w";
Document doc = Jsoup.connect(url).get();
Elements imageElements = doc.select("img[class=productImage]");
String img = imageElements.attr("src");
System.out.println(img);
Here is the HTML code of the from the website . screenshot of html code
Upvotes: 0
Views: 1092
Reputation: 1037
EDIT: This works.
String url = "http://www.flipkart.com/moto-g-3rd-generation/p/itme9ysjr7mfry3n?pid=MOBE6KK93JG5WKB2&cmpid=content_mobile_8965229628_gmc_pla&tgi=sem%2C1%2CG%2C11214002%2Cg%2Csearch%2C%2C50314733420%2C1o1%2C%2C%2Cc%2C%2C%2C%2C%2C%2C%2C&gclid=COXtgdLyiMoCFUyhaAodIawO8w";
Document doc;
try {
doc = Jsoup.connect(url).get();
Elements imageElements = doc.select("img.productImage");
for(Element e : imageElements){
System.out.println(e.attr("data-src"));
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
changes:
you used [class= which can be shortened using .productImage instead
you didn't put a try catch (relatively not important)
you try to get the attr for all elements. You should use a loop instead.
EDIT: OK. I found out the problem about the weird src of the image. It seems like that the Javascript modified the "src" attribute after the page was rendered. The real src is the "data-src" value which is later deleted by the javascript. Weird, huh?
Upvotes: 1