Ahmed Ahmed
Ahmed Ahmed

Reputation: 105

How to retrieve URL from link tags using Jsoup

<article itemprop="articleBody">
  <p channel="wp.com" class="interstitial-link">
     <i>
        [<a href="www.URL.com" shape="rect">Link Text</a>]
     </i>
  </p>
<article>

How would I retrieve the URL and Link text with Jsoup from this HTML doc? I want it to look like this

"Link Text[URL]"

Edit: I want to retrieve only the links within

<article itemprop="articleBody"> ... <article>

Not the entire page. Also, I want all the links within, not just one.

Upvotes: 1

Views: 611

Answers (1)

Zack
Zack

Reputation: 4047

    // connect to URL and retrieve source code as document
    Document doc = Jsoup.connect(url).get();

    // find the link element in the article
    Element link = doc
            .select("article[itemprop=articleBody] p.interstitial-link i a")
            .first();

    // extract the link text
    String linkText = link.ownText();

    // extract the full url of the href
    // use this over link.attr("href") to avoid relative url
    String linkURL = link.absUrl("href");


    // display
    System.out.println(
            String.format(
                    "%s[%s]", 
                    linkText,
                    linkURL));

Read more about CSS Selectors


You could also iterate each link in the article like this:

    for (Element link : doc.select("article[itemprop=articleBody] a")) {
        String linkText = link.ownText();
        String linkURL = link.absUrl("href");
        System.out.println(
                String.format(
                        "%s[%s]", 
                        linkText,
                        linkURL));
    }

Output

Link Text[http://www.URL.com]

Upvotes: 1

Related Questions