Andrew
Andrew

Reputation: 6394

Java: Parse html file and extract text

I want to parse an HTML file and store the bold text (inside <b> tags). One solution is to read the file line by line and split or use RegEx. This means that I should store the entire page in a String variable? If I don't save it in a variable then I have no guarantee that the start of the tag and the end of it are on the same line.

What solution do you suggest?

Upvotes: 2

Views: 4796

Answers (2)

camickr
camickr

Reputation: 324088

it is a project I have for university

Use HTMLEditorKit.ParserCallback

Upvotes: 0

David
David

Reputation: 20063

Use JSoup to parse the contents

String html = "<html><head><title>First parse</title></head>"
  + "<body><p>Parsed HTML into a doc.</p></body></html>";

Document doc = Jsoup.parse(html);

Upvotes: 5

Related Questions