Reputation: 165
I'm using JSoup to crawler pages. I usually need get all url's imagens inside a page or a piece of page and put then in a ArrayList<String>
. Suposed the follow document:
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>News Page</title>
</head>
<body>
<div class="news">
<div class="new">
<div class="image">
<img src="../images/img01.jpg" />
</div>
<div class="info">
<p class="title">
Grumpy wizards make toxic brew for the evil Queen and Jack.
</p>
</div>
</div>
<div class="new">
<div class="image">
<img src="../images/img02.jpg" />
</div>
<div class="info">
<p class="title">
The quick brown fox jumps over the lazy dog.
</p>
</div>
</div>
<div class="new">
<div class="image">
<img src="../images/img03.jpg" />
</div>
<div class="info">
<p class="title">
Pack my box with five dozen liquor jugs.
</p>
</div>
</div>
</div>
</body>
</html>
I do it on this way:
Document document = Jsoup.parse(html);
Elements images = document.select(img);
ArrayList<String> binaryUrls = new ArrayList<String>();
for(Element image : images) {
binaryUrls.add(image.absUrl("src"));
}
And the result:
['http//www.newssite.com/images/img01.jpg', 'http//www.newssite.com/images/img02.jpg', 'http//www.newssite.com/images/img03.jpg']
It works, but I want know if exists a short way, just with Jsoup to do it.
In a production envirioment we used Java 6, yet. If possible, I liked know a Java 6 mode and a Java 8 mode, with lambda.
Upvotes: 3
Views: 1308
Reputation: 1208
No suggestion for Java6.
Using Lambda in Java 8:
ArrayList<String> binaryUrls = Jsoup.parse(html).select("img")
.stream().map(p -> p.absUrl("src"))
.collect(Collectors.toCollection(ArrayList::new));
Or if the return type can be just List<String>
:
List<String> binaryUrls = Jsoup.parse(html).select("img")
.stream().map(p -> p.absUrl("src"))
.collect(Collectors.toList());
Upvotes: 1