Reputation: 2896
I am working on this project that requires me to carry out some text manipulation out of the text that I obtain from web pages. Now, the first step towards doing this would be for me to find a parser that would extract the required body text ignoring the redundant information. I am not sure how I would do this, since I am extremely new to programming. I would really appreciate any help I could get. Thanks in advance
Upvotes: 0
Views: 6275
Reputation: 31
I found this html parser very useful. It also provides a sample example . http://jericho.htmlparser.net/docs/index.html
Upvotes: 3
Reputation: 301
I am just now doing it using HTMLParser, available at Sourceforge: http://sourceforge.net/projects/htmlparser/
Seems very easy and straightforward, but since you claim to be new at this, here is an example with source code: http://kickjava.com/src/org/htmlparser/parserapplications/StringExtractor.java.htm
Upvotes: 1