Reputation: 283
I appreciate if you let me know if there is a java class to extract information from an HTML page according to an XML?
Thanks
Upvotes: 0
Views: 885
Reputation: 1
I used HtmlUnit:
final HtmlPage page1 = webClient.getPage("https://jira/secure/Dashboard.jspa");
final HtmlForm form = page1.getFormByName("loginform");
final HtmlTextInput textField = form.getInputByName("os_username");
final HtmlPasswordInput pwd = form.getInputByName("os_password");
textField.setValueAttribute(jname);
pwd.setValueAttribute(jpasswd);
final HtmlPage page2 = (HtmlPage) form.getInputByValue("Login").click();
Upvotes: 0
Reputation: 347184
Personally, I use Cobra.
It allows you to treat HTML as XML, creating a DOM. This allows you to use such tools as xPath
Take a look at Java HTML Parser for examples
Upvotes: 1
Reputation: 13051
you can use Jsoup. I use this and is very good to parse html. Here is an example from Jsoup site:
Example Fetch the Wikipedia homepage, parse it to a DOM, and select the headlines from the In the news section into a list of Elements:
Document doc = Jsoup.connect("http://en.wikipedia.org/").get();
Elements newsHeadlines = doc.select("#mp-itn b a");
Upvotes: 3