Mascarpone
Mascarpone

Reputation: 2556

Parsing HTML from a web page

I have to extract some information from a web page, and reformat it for the user.

Since the web page is somewhat regular, now I use HttpClient to retrive the HTML as a string, and I extract substrings in given locations with the relevant data.

Anyhow I'm wondering if there is a better way, maybe an HTML-aware way. How would you do it?

Cheers

Upvotes: 6

Views: 3280

Answers (4)

FolksLord
FolksLord

Reputation: 1000

I personally like to use Jericho parser: http://jericho.htmlparser.net/docs/index.html

It is easy to use, have very much examples on project's page and deals good with pure HTML (unclosed tags etc.).

Upvotes: 3

bltc
bltc

Reputation: 371

jsoup.org is better but Cobra have also some addidtional features (CSS-aware and JavaScript-aware).

Upvotes: 1

Speck
Speck

Reputation: 2299

We've used HTTPUnit do do this in the past.

Upvotes: 1

Computerish
Computerish

Reputation: 9591

Ideally, you should use a real HTML-parser. I've used Jsoup successfully in the past on Android:

http://jsoup.org/

Upvotes: 7

Related Questions