Reputation: 5745
I need to extract all images from a website. Up until now I used
Elements images = node.select("img");
With Jsoup. But I realized that it can miss some images, for example the top image in this site which is defined in the css.
Is there an easy way to not miss such images, and obtain all the images in a page as Jsoup elements? Note that 'there is no easy way to do this' is also a valid answer, I'd just like to know that. Thanks!
Upvotes: 1
Views: 862
Reputation: 1434
Get URL of image in CSS using java
.. find answer from this stack overflow question. Get URL of images in CSS file using Java?
or
You can think some logic like this https://gist.github.com/parkerl/1771976
Upvotes: 1
Reputation: 45
The following code will work for getting most of the images except the ones defined in css. The path for the images through css is relative to the path of the css file itslef
function img_find() {
var imgs = document.getElementsByTagName("img");
var imgSrcs = [];
for (var i = 0; i < imgs.length; i++) {
imgSrcs.push(imgs[i].src);
}
return imgSrcs;
}
You may also try
window.getComputedStyle(document.getElementByTag("img"),null).getPropertyValue('background-image')
Upvotes: 0
Reputation: 1834
Jsoup
cannot be of any help to you in extracting background-image
s defined in css
.
Check this for a solution.
Upvotes: 1
Reputation: 452
If you just need to get All images from the site you first need to get all tags and than get all background images from css that is loaded on this site. But still this can be not all images because some can be loaded by javascript(or some tags or css can be changed with javascript).
To make sure you get everything you need to completely load the website as the browser loads it, get the DOM tree, iterate over it, check css properties in each element in the DOM for background images, and also extract all images from tags.
I am not sure if Jsoup can do this. Probably it will not take in account javascript and maybe even css. It seems to be just html parser.
I think the best way is to use developer console of a browser(sending back to my answer to your previous question). From the console you can execute javascript for iterating over all elements in the DOM tree and get a list of urls that you can later copy and use for loading images by some server side script.
Upvotes: 2