Reputation: 149
I'm trying figuring out how to separate useless information from link with jsoup
.
Bunch of code which I should parse here:
view-source:https://vk.com/search?c%5Bq%5D=%D0%BA%D0%BE%D1%82&c%5Bsection%5D=communities
public class TestSoup {
public static void main (String[] args) throws Exception {
Document doc = Jsoup.connect("https://vk.com/smcat").get();
Elements links;
//links = doc.select("div > a > img ");
links = doc.select("[data-src_big]");
System.out.println(links);
}
}
My output now:
<img src="https://pp.vk.me/c636126/v636126727/35e1b/ludjlj7T4i8.jpg" class="ph_img" data-id="-23530818_436648332" data-src_big="https://pp.vk.me/c636126/v636126727/35e1c/a1IyGrtjzUQ.jpg|600|448">
Can someone explain how I can extract second link from my output? Many thanks.
Upvotes: 0
Views: 65
Reputation: 124285
data-src_big
is attribute and each element can have its own value for it.
To iterate over link elements you can use
for (Element el : links){
..
}
To get value for specified attribute from element you can use
el.attr("attrribute_name")
If value of attribute is URL address written as relative path like./foo/bar.jpg
but you want to get it as absolute path like http://server.com/foo/bar.jpg
you can use
el.absUrl("attribute_name")
Upvotes: 2
Reputation: 11712
You can find this in the Jsoup cookbook. In short, you use the attr
method of Element
links = doc.select("[data-src_big]");
String linkStr = links.attr("data-src_big");
Note that links
is of type Elements, and attr()
just gets the first matching attribute.
Upvotes: 1
Reputation: 1477
You just need to get both src
and data-src_big
from the links found by div > a > img
using attr(name)
method:
for (Element element : doc.select("div > a > img")) {
String src = element.attr("src");
String big = element.attr("data-src_big");
}
Upvotes: 1