Idan
Idan

Reputation: 5745

Extracting all images from website

I need to extract all images from a website. Up until now I used

Elements images = node.select("img");

With Jsoup. But I realized that it can miss some images, for example the top image in this site which is defined in the css.

Is there an easy way to not miss such images, and obtain all the images in a page as Jsoup elements? Note that 'there is no easy way to do this' is also a valid answer, I'd just like to know that. Thanks!

Upvotes: 1

Views: 862

Answers (4)

Amith
Amith

Reputation: 1434

Get URL of image in CSS using java.. find answer from this stack overflow question. Get URL of images in CSS file using Java?

or

You can think some logic like this https://gist.github.com/parkerl/1771976

Upvotes: 1

Sushruth Siv
Sushruth Siv

Reputation: 45

The following code will work for getting most of the images except the ones defined in css. The path for the images through css is relative to the path of the css file itslef

function img_find() {
    var imgs = document.getElementsByTagName("img");
    var imgSrcs = [];

for (var i = 0; i < imgs.length; i++) {
    imgSrcs.push(imgs[i].src);
}

return imgSrcs;
}

You may also try

window.getComputedStyle(document.getElementByTag("img"),null).getPropertyValue('background-image')

Upvotes: 0

Niranjan
Niranjan

Reputation: 1834

Jsoup cannot be of any help to you in extracting background-images defined in css.

Check this for a solution.

Upvotes: 1

trims
trims

Reputation: 452

If you just need to get All images from the site you first need to get all tags and than get all background images from css that is loaded on this site. But still this can be not all images because some can be loaded by javascript(or some tags or css can be changed with javascript).

To make sure you get everything you need to completely load the website as the browser loads it, get the DOM tree, iterate over it, check css properties in each element in the DOM for background images, and also extract all images from tags.

I am not sure if Jsoup can do this. Probably it will not take in account javascript and maybe even css. It seems to be just html parser.

I think the best way is to use developer console of a browser(sending back to my answer to your previous question). From the console you can execute javascript for iterating over all elements in the DOM tree and get a list of urls that you can later copy and use for loading images by some server side script.

Upvotes: 2

Related Questions