Reputation: 21
I have series of HTML files with the same structures.
Let take this example code.
> <html>
> <head>
> <title>main page</title>
> </head>
> <body>
> <table><tr>
> <td>content1</td>
> </tr></table>
> </body>
> </html>
I want to extract the title tag content and td tag content. How to do this using htmlunit? I am new to htmlunit. Please help me.
Upvotes: 0
Views: 2211
Reputation: 6921
See this instructive snippet from the HTMLUnit page.
In there you first construct a client, then retrieve your page, finally ask for the title text (page.getTitleText()
), or get the entire page as a HTML String (page.asXml()
). You could then assertContains
on that string.
There are plenty of other options, like retrieving elements by id. Best see the examples for yourself.
Upvotes: 1
Reputation: 120496
htmlunit is a testing system. Not a DOM parser.
To parse HTML to a DOM use http://about.validator.nu/htmlparser/ and use the HtmlDocumentBuilder class.
Once you have a Document
you can do myDocument.getElementsByTagName("title")
to find the title element.
Upvotes: 0