Reputation: 2556
I have to extract some information from a web page, and reformat it for the user.
Since the web page is somewhat regular, now I use HttpClient to retrive the HTML as a string, and I extract substrings in given locations with the relevant data.
Anyhow I'm wondering if there is a better way, maybe an HTML-aware way. How would you do it?
Cheers
Upvotes: 6
Views: 3280
Reputation: 1000
I personally like to use Jericho parser: http://jericho.htmlparser.net/docs/index.html
It is easy to use, have very much examples on project's page and deals good with pure HTML (unclosed tags etc.).
Upvotes: 3
Reputation: 371
jsoup.org is better but Cobra have also some addidtional features (CSS-aware and JavaScript-aware).
Upvotes: 1
Reputation: 9591
Ideally, you should use a real HTML-parser. I've used Jsoup successfully in the past on Android:
Upvotes: 7