Dominik Filipiak
Dominik Filipiak

Reputation: 1272

Parsing words and tags from HTML in Java

I need to extract all tags and words (in chronological order) from html file. Here's the example of file: one two thre What I want at the output is an array or a list which looks like this: {"", "one", "two", "thre", ""} I know that there are tools such as jTidy or Apache Tina, but these tools are for extracting only text (or only tags) from a document. What should I do?

Upvotes: 0

Views: 90

Answers (1)

Mike Thomsen
Mike Thomsen

Reputation: 37506

Use the JSoup library for this. It makes HTML parsing in Java incredibly easy.

Upvotes: 1

Related Questions