benscabbia
benscabbia

Reputation: 18072

Parsing links for href value using JSoup works for a single link, but not for an array of links

I have managed to successfully grab the href links using JSoup. I have also managed to grab the relative value and absolute value of a href for a single link. As shown below:

//works perfectly, website: bbc.co.uk
Document document = Jsoup.connect(url).get();
Element link = document.select("a").last();
String relHref = testlink.attr("href");
String absHref = testlink.attr("abs:href");
System.out.println(relHref);   
System.out.println(absHref);

 //output: 
relHref: /help/web/links/
absHref: http://www.bbc.co.uk/help/web/links/

I can even use Element link = document.select("a").first(); and this also works. However, when I try and add this in a loop to iterate through all of the grabbed links and print out each link, it doesn't give me the expected results. Here is my code:

 //not working
 Elements links = document.select("a");
 for(int i=0; i<links.size(); i++){
        String relHref = links.attr("href");
        String absHref = links.attr("abs:href");
        System.out.println(relHref);
        System.out.println(absHref);
    }
//output 
http://m.bbc.co.uk
http://m.bbc.co.uk
http://m.bbc.co.uk
....

I know the links array of type Elements has the correct data, and if I try and print the elements in the links array it displays all of the href tags i.e.

for (Element link : links) {
        System.out.println(link);
    }
//output 116 links:

<a href="http://m.bbc.co.uk">mobile site</a>
<a href="/"> <img src="http://static.bbci.co.uk/frameworks/barlesque/2.72.5/orb/4/img/bbc-blocks-dark.png" width="84" height="24" alt="BBC"> </a>
<a href="#h4discoveryzone">Skip to content</a>
<a id="orb-accessibility-help" href="/accessibility/">Accessibility Help</a>
....

But how do I get the relHref and absHref for an array to work? Instead my code just prints out the first link over and over again. I've been going at this for hours, so I'm probably making a silly mistake somewhere but help is appreciated!

Thanks.

Upvotes: 0

Views: 1178

Answers (2)

T.J. Crowder
T.J. Crowder

Reputation: 1074148

On this line:

String relHref = links.attr("href");

...how is it supposed to know you're talking about the ith link? (It doesn't: Elements#attr always returns the value for the first entry in the Elements collection.)

You want

String relHref = links.get(i).attr("href");

...which gets the specific link you're interested in via Elements#get, then uses Node#attr on it.

That said, though, I would just use the enhanced for loop:

for (Element link : document.select("a")) {
    String relHref = link.attr("href");
    String absHref = link.attr("abs:href");
    System.out.println(relHref);
    System.out.println(absHref);
}

...unless you need i for something.

Upvotes: 2

Hovercraft Full Of Eels
Hovercraft Full Of Eels

Reputation: 285403

You need to use the Elements method, get(int index) inside of your for loop to get each Element held by your Elements.

e.g.,

Elements links = document.select("a");
for(int i=0; i < links.size(); i++) {
    Element ele = links.get(i);

    /// use ele here to extract info from each Element

}

Upvotes: 1

Related Questions