user1586205
user1586205

Reputation: 283

parsing html in java to extract information

I appreciate if you let me know if there is a java class to extract information from an HTML page according to an XML?

Thanks

Upvotes: 0

Views: 885

Answers (3)

Eugene K
Eugene K

Reputation: 1

I used HtmlUnit:

final HtmlPage page1 = webClient.getPage("https://jira/secure/Dashboard.jspa");
final HtmlForm form = page1.getFormByName("loginform");
final HtmlTextInput textField = form.getInputByName("os_username");
final HtmlPasswordInput pwd = form.getInputByName("os_password");
textField.setValueAttribute(jname);
pwd.setValueAttribute(jpasswd);
final HtmlPage page2 = (HtmlPage) form.getInputByValue("Login").click();    

Upvotes: 0

MadProgrammer
MadProgrammer

Reputation: 347184

Personally, I use Cobra.

It allows you to treat HTML as XML, creating a DOM. This allows you to use such tools as xPath

Take a look at Java HTML Parser for examples

Upvotes: 1

Jayyrus
Jayyrus

Reputation: 13051

you can use Jsoup. I use this and is very good to parse html. Here is an example from Jsoup site:

Example Fetch the Wikipedia homepage, parse it to a DOM, and select the headlines from the In the news section into a list of Elements:

Document doc = Jsoup.connect("http://en.wikipedia.org/").get();
Elements newsHeadlines = doc.select("#mp-itn b a");

Upvotes: 3

Related Questions