Kojer Defor
Kojer Defor

Reputation: 149

Get concrete URL with Jsoup

I'm trying figuring out how to separate useless information from link with jsoup. Bunch of code which I should parse here:

view-source:https://vk.com/search?c%5Bq%5D=%D0%BA%D0%BE%D1%82&c%5Bsection%5D=communities

public class TestSoup  {
    public static void main (String[] args) throws Exception {
        Document doc = Jsoup.connect("https://vk.com/smcat").get();
        Elements links;
        //links = doc.select("div > a > img ");
        links = doc.select("[data-src_big]");

        System.out.println(links);
    }
}

My output now:

<img src="https://pp.vk.me/c636126/v636126727/35e1b/ludjlj7T4i8.jpg" class="ph_img" data-id="-23530818_436648332" data-src_big="https://pp.vk.me/c636126/v636126727/35e1c/a1IyGrtjzUQ.jpg|600|448">

Can someone explain how I can extract second link from my output? Many thanks.

Upvotes: 0

Views: 65

Answers (3)

Pshemo
Pshemo

Reputation: 124285

data-src_big is attribute and each element can have its own value for it.

To iterate over link elements you can use

for (Element el : links){
    ..    
}

To get value for specified attribute from element you can use

el.attr("attrribute_name")

If value of attribute is URL address written as relative path like./foo/bar.jpg but you want to get it as absolute path like http://server.com/foo/bar.jpg you can use

el.absUrl("attribute_name")

Upvotes: 2

luksch
luksch

Reputation: 11712

You can find this in the Jsoup cookbook. In short, you use the attr method of Element

links = doc.select("[data-src_big]");
String linkStr = links.attr("data-src_big");

Note that links is of type Elements, and attr() just gets the first matching attribute.

Upvotes: 1

kgeorgiy
kgeorgiy

Reputation: 1477

You just need to get both src and data-src_big from the links found by div > a > img using attr(name) method:

for (Element element : doc.select("div > a > img")) {
    String src = element.attr("src");
    String big = element.attr("data-src_big");
}

Upvotes: 1

Related Questions