Reputation: 18072
I have managed to successfully grab the href links using JSoup. I have also managed to grab the relative value and absolute value of a href for a single link. As shown below:
//works perfectly, website: bbc.co.uk
Document document = Jsoup.connect(url).get();
Element link = document.select("a").last();
String relHref = testlink.attr("href");
String absHref = testlink.attr("abs:href");
System.out.println(relHref);
System.out.println(absHref);
//output:
relHref: /help/web/links/
absHref: http://www.bbc.co.uk/help/web/links/
I can even use Element link = document.select("a").first();
and this also works. However, when I try and add this in a loop to iterate through all of the grabbed links and print out each link, it doesn't give me the expected results. Here is my code:
//not working
Elements links = document.select("a");
for(int i=0; i<links.size(); i++){
String relHref = links.attr("href");
String absHref = links.attr("abs:href");
System.out.println(relHref);
System.out.println(absHref);
}
//output
http://m.bbc.co.uk
http://m.bbc.co.uk
http://m.bbc.co.uk
....
I know the links array of type Elements has the correct data, and if I try and print the elements in the links array it displays all of the href tags i.e.
for (Element link : links) {
System.out.println(link);
}
//output 116 links:
<a href="http://m.bbc.co.uk">mobile site</a>
<a href="/"> <img src="http://static.bbci.co.uk/frameworks/barlesque/2.72.5/orb/4/img/bbc-blocks-dark.png" width="84" height="24" alt="BBC"> </a>
<a href="#h4discoveryzone">Skip to content</a>
<a id="orb-accessibility-help" href="/accessibility/">Accessibility Help</a>
....
But how do I get the relHref and absHref for an array to work? Instead my code just prints out the first link over and over again. I've been going at this for hours, so I'm probably making a silly mistake somewhere but help is appreciated!
Thanks.
Upvotes: 0
Views: 1178
Reputation: 1074148
On this line:
String relHref = links.attr("href");
...how is it supposed to know you're talking about the i
th link? (It doesn't: Elements#attr
always returns the value for the first entry in the Elements
collection.)
You want
String relHref = links.get(i).attr("href");
...which gets the specific link you're interested in via Elements#get
, then uses Node#attr
on it.
That said, though, I would just use the enhanced for
loop:
for (Element link : document.select("a")) {
String relHref = link.attr("href");
String absHref = link.attr("abs:href");
System.out.println(relHref);
System.out.println(absHref);
}
...unless you need i
for something.
Upvotes: 2
Reputation: 285403
You need to use the Elements method, get(int index)
inside of your for loop to get each Element held by your Elements.
e.g.,
Elements links = document.select("a");
for(int i=0; i < links.size(); i++) {
Element ele = links.get(i);
/// use ele here to extract info from each Element
}
Upvotes: 1