Tano
Tano

Reputation: 1377

Java jsoup link extracting

I am trying to extract the links within a given element in jsoup. Here what I have done but its not working:

   Document doc = Jsoup.connect(url).get();
        Elements element = doc.select("section.row");
        Element s = element.first();
        Elements se = s.getElementsByTag("article");


            for(Element link : se){
                System.out.println("link :" + link.select("href"));
            }

Here is the html: enter image description here

The thing I am trying to do is get all the links withing the article classes. I thought that maybe first I must select the section class ="row", and then after that derive somehow the links from the article class but I could not make it work.

Upvotes: 1

Views: 126

Answers (2)

Eritrean
Eritrean

Reputation: 16498

Try out this.

Document doc = Jsoup.connect(url).get();      

    Elements section = doc.select("#main"); //select section with the id = main
    Elements allArtTags = section.select("article"); // select all article tags in that section
    for (Element artTag : allArtTags ){
        Elements atags = artTag.select("a"); //select all a tags in each article tag
        for(Element atag : atags){
            System.out.println(atag.text()); //print the link text or 
            System.out.println(atag.attr("href"));//print link
        }
    }

Upvotes: 1

Exceptyon
Exceptyon

Reputation: 1582

I'm using this in one of my projects:

final Elements elements = doc.select("div.item_list_section.item_description");

you'll have to get the elements you want to extract links from.

private static ... inspectElement(Element e) {
        try {
            final String name = getAttr(e, "a[href]");
            final String link = e.select("a").first().attr("href");
            //final String price = getAttr(e, "span.item_price");
            //final String category = getAttr(e, "span.item_category");
            //final String spec = getAttr(e, "span.item_specs");
            //final String datetime = e.select("time").attr("datetime");

            ...
        }
        catch (Exception ex) { return null; }
}

private static String getAttr(Element e, String what) {
    try {
        return e.select(what).first().text();
    }
    catch (Exception ex) { return ""; }
}

Upvotes: 0

Related Questions