blue-sky
blue-sky

Reputation: 53806

Unable to return text within href (jSoup)

Here is a code snippet I am using to access "test" from below html snippet. How can I access the URL https://www.google.com from within html ?

Elements e = doc.getElementsByAttribute("href");
Iterator<Element> href = e.iterator();
    while ( href.hasNext() ){
    Element link = href.next();
    String text = link.text();
    }



   <a href="javascript:linkToExternalSite('https://www.google.com','','61x38pxls','','','','','')">Test</a>

Upvotes: 0

Views: 364

Answers (3)

Rodrigo Gauzmanf
Rodrigo Gauzmanf

Reputation: 2527

    String html = "<a href=\"javascript:linkToExternalSite('https://www.google.com','','61x38pxls','','','','','')\">Test</a>";
    Document doc = Jsoup.parse(html);
    Element e = doc.select("a[href]").first();
    String href = e.attr("href");   
    String arg[] = href.split("'");
    String url = arg[1];
    // Output: 'https://www.google.com'
    System.out.println(url);

Upvotes: 0

RanRag
RanRag

Reputation: 49547

I am no Jsoup expert but Jsoup is an html parser you can't use it to parse content inside javascript tag.

So, your approach should be to extract

"javascript:linkToExternalSite('https://www.google.com','','61x38pxls','','','','','')"

using Jsoup.

Than use regular expressions to fetch the content/url.

Upvotes: 1

Hauke Ingmar Schmidt
Hauke Ingmar Schmidt

Reputation: 11607

The HREF is an attribute which you can access with the attrmethod of Jsoup's element. This gives you the whole content of the attribute, of course, you need some pattern matching to retrieve the URL.

Upvotes: 0

Related Questions