Reputation: 105
<article itemprop="articleBody">
<p channel="wp.com" class="interstitial-link">
<i>
[<a href="www.URL.com" shape="rect">Link Text</a>]
</i>
</p>
<article>
How would I retrieve the URL and Link text with Jsoup from this HTML doc? I want it to look like this
"Link Text[URL]"
Edit: I want to retrieve only the links within
<article itemprop="articleBody"> ... <article>
Not the entire page. Also, I want all the links within, not just one.
Upvotes: 1
Views: 611
Reputation: 4047
// connect to URL and retrieve source code as document
Document doc = Jsoup.connect(url).get();
// find the link element in the article
Element link = doc
.select("article[itemprop=articleBody] p.interstitial-link i a")
.first();
// extract the link text
String linkText = link.ownText();
// extract the full url of the href
// use this over link.attr("href") to avoid relative url
String linkURL = link.absUrl("href");
// display
System.out.println(
String.format(
"%s[%s]",
linkText,
linkURL));
Read more about CSS Selectors
You could also iterate each link in the article like this:
for (Element link : doc.select("article[itemprop=articleBody] a")) {
String linkText = link.ownText();
String linkURL = link.absUrl("href");
System.out.println(
String.format(
"%s[%s]",
linkText,
linkURL));
}
Output
Link Text[http://www.URL.com]
Upvotes: 1