leba-lev
leba-lev

Reputation: 2896

HTML Parser to extract text out of the body (in java)

I am working on this project that requires me to carry out some text manipulation out of the text that I obtain from web pages. Now, the first step towards doing this would be for me to find a parser that would extract the required body text ignoring the redundant information. I am not sure how I would do this, since I am extremely new to programming. I would really appreciate any help I could get. Thanks in advance

Upvotes: 0

Views: 6275

Answers (2)

Amit Patil
Amit Patil

Reputation: 31

I found this html parser very useful. It also provides a sample example . http://jericho.htmlparser.net/docs/index.html

Upvotes: 3

andli
andli

Reputation: 301

I am just now doing it using HTMLParser, available at Sourceforge: http://sourceforge.net/projects/htmlparser/

Seems very easy and straightforward, but since you claim to be new at this, here is an example with source code: http://kickjava.com/src/org/htmlparser/parserapplications/StringExtractor.java.htm

Upvotes: 1

Related Questions