Reputation: 53806
Here is a code snippet I am using to access "test" from below html snippet. How can I access the URL https://www.google.com from within html ?
Elements e = doc.getElementsByAttribute("href");
Iterator<Element> href = e.iterator();
while ( href.hasNext() ){
Element link = href.next();
String text = link.text();
}
<a href="javascript:linkToExternalSite('https://www.google.com','','61x38pxls','','','','','')">Test</a>
Upvotes: 0
Views: 364
Reputation: 2527
String html = "<a href=\"javascript:linkToExternalSite('https://www.google.com','','61x38pxls','','','','','')\">Test</a>";
Document doc = Jsoup.parse(html);
Element e = doc.select("a[href]").first();
String href = e.attr("href");
String arg[] = href.split("'");
String url = arg[1];
// Output: 'https://www.google.com'
System.out.println(url);
Upvotes: 0
Reputation: 49547
I am no Jsoup expert but Jsoup is an html parser you can't use it to parse content inside javascript
tag.
So, your approach should be to extract
"javascript:linkToExternalSite('https://www.google.com','','61x38pxls','','','','','')"
using Jsoup.
Than use regular expressions to fetch the content/url.
Upvotes: 1
Reputation: 11607
The HREF is an attribute which you can access with the attr
method of Jsoup's element. This gives you the whole content of the attribute, of course, you need some pattern matching to retrieve the URL.
Upvotes: 0