Reputation: 3927
<div></div>
<div></div>
<div></div>
<div></div>
<ul>
<form id=the_main_form method="post">
<li>
<div></div>
<div> <h2>
<a onclick="xyz;" target="_blank" href="http://sample.com" style="text-decoration:underline;">This is sample</a>
</h2></div>
<div></div>
<div></div>
</li>
there are 50 li's like that
I have posted the snip of the html from a big HTML.
<div> </div>
=> means there is data in between them removed the data as it is not neccessary.
I would like to know how the JSOUP- select statement be to extract the href and Text?
I selected doc.select("div div div ul xxxx");
where xxx is form ..shoud I give the form id (or) how should I do that
Upvotes: 0
Views: 548
Reputation: 3843
Try this:
Elements allLis = doc.select("#the_main_form > li ");
for (Element li : allLis) {
Element a = li.select("div:eq(1) > h2 > a");
String href = a.attr("href");
String text = a.text();
}
Hope it helps!
EDIT:
Elements allLis = doc.select("#the_main_form > li ");
This part of the code gets all <li>
tags that are inside the <form>
with id #the_main_form
.
Element a = li.select("div:eq(1) > h2 > a");
Then we iterate over all the <li>
tags and get <a>
tags, by first getting <div>
tags ( the second one inside all <li>
s by using index=1 -> div:eq(1)
) then getting <h2>
tags, where our <a>
tags are present.
Hope you understand now! :)
Upvotes: 1
Reputation: 8032
Please try this:
package com.stackoverflow.works;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
/*
* @ author: sarath_sivan
*/
public class HtmlParserService {
public static void parseHtml(String html) {
Document document = Jsoup.parse(html);
Element linkElement = document.select("a").first();
String linkHref = linkElement.attr("href"); // "http://sample.com"
String linkText = linkElement.text(); // "This is sample"
System.out.println(linkHref);
System.out.println(linkText);
}
public static void main(String[] args) {
String html = "<a onclick=\"xyz;\" target=\"_blank\" href=\"http://sample.com\" style=\"text-decoration:underline;\">This is sample</a>";
parseHtml(html);
}
}
Hope you have the Jsoup Library in your classpath.
Thank you!
Upvotes: 1