Reputation: 1267
Basically, I am using Jsoup to parse a site, I want to get all the links from the following html:
<ul class="detail-main-list">
<li>
<a href="/manga/toki_wa/v01/c001/1.html" title="Toki wa... Vol.01 Ch.001 -Toki wa... target="_blank"> Dis Be the link</a>
</li>
</ul>
Any idea how?
Upvotes: 0
Views: 681
Reputation: 1143
You can do a specific a href link in this way from any website.
public static void main(String[] args) {
String htmlString = "<html>\n" +
" <head></head>\n" +
" <body>\n" +
"<ul class=\"detail-main-list\">\n" +
" <li> \n" +
" <a href=\"/manga/toki_wa/v01/c001/1.html\" title=\"Toki wa... Vol.01 Ch.001 -Toki wa... target=\"_blank\"> Dis Be the link</a>\n" +
" </li> \n" +
"</ul>" +
" </body>\n" +
"</html>"
+ "<head></head>";
Document html = Jsoup.parse(htmlString);
Elements elements = html.select("a");
for(Element element: elements){
System.out.println(element.attr("href"));
}
}
Output:
/manga/toki_wa/v01/c001/1.html
Upvotes: 1
Reputation: 102933
Straight from jsoup.org, right there, first thing you see:
Document doc = Jsoup.connect("https://en.wikipedia.org/").get();
log(doc.title());
Elements newsHeadlines = doc.select("#mp-itn b a");
for (Element headline : newsHeadlines) {
log("%s\n\t%s",
headline.attr("title"), headline.absUrl("href"));
}
Modifying this to what you need seems trivial:
Document doc = Jsoup.connect("https://en.wikipedia.org/").get();
Elements anchorTags = doc.select("ul.detail-main-list a");
for (Element anchorTag : anchorTags) {
System.out.println("Links to: " + anchorTag.attr("href"));
System.out.println("In absolute form: " + anchorTag.absUrl("href"));
System.out.println("Text content: " + anchorTag.text());
}
The ul.detail-main-list a
part is a so-called selector string. A real short tutorial on these:
foo
means: Any HTML element with that tag name, i.e. <foo></foo>
..bar
means: Any HTML element with class bar
, i.e. <foo class="bar baz"></foo>
#bar
means: Any HTML element with id bar
, i.e. <foo id="bar">
ul.detail-main-list
matches any <ul>
tags that have the string detail-main-list
in their list of classes.a b
means: all things matching the 'b' selection that have something matching 'a' as a parent. So ul a
matches all <a>
tags that have a <ul>
tag around them someplace.The JSoup docs are excellent.
Upvotes: 2