How get all url images in a page with JSoup?

Question

I'm using JSoup to crawler pages. I usually need get all url's imagens inside a page or a piece of page and put then in a ArrayList. Suposed the follow document:



  
    
    News Page
  
  
    
      
        
          
        
        
          
            Grumpy wizards make toxic brew for the evil Queen and Jack.
          
        
      
      
        
          
        
        
          
            The quick brown fox jumps over the lazy dog.
          
        
      
      
        
          
        
        
          
            Pack my box with five dozen liquor jugs.

I do it on this way:

Document document = Jsoup.parse(html);
Elements images = document.select(img);

ArrayList binaryUrls = new ArrayList();
for(Element image : images) {
    binaryUrls.add(image.absUrl("src"));
}

And the result:

['http//www.newssite.com/images/img01.jpg', 'http//www.newssite.com/images/img02.jpg', 'http//www.newssite.com/images/img03.jpg']

It works, but I want know if exists a short way, just with Jsoup to do it.

In a production envirioment we used Java 6, yet. If possible, I liked know a Java 6 mode and a Java 8 mode, with lambda.

JM Yang · Accepted Answer

No suggestion for Java6.

Using Lambda in Java 8:

ArrayList binaryUrls = Jsoup.parse(html).select("img")
    .stream().map(p -> p.absUrl("src"))
    .collect(Collectors.toCollection(ArrayList::new));

Or if the return type can be just List:

List binaryUrls = Jsoup.parse(html).select("img")
    .stream().map(p -> p.absUrl("src"))
    .collect(Collectors.toList());

How get all url images in a page with JSoup?

Answers (1)

Related Questions