Reputation: 1272
I need to extract all tags and words (in chronological order) from html file. Here's the example of file: one two thre What I want at the output is an array or a list which looks like this: {"", "one", "two", "thre", ""} I know that there are tools such as jTidy or Apache Tina, but these tools are for extracting only text (or only tags) from a document. What should I do?
Upvotes: 0
Views: 90
Reputation: 37506
Use the JSoup library for this. It makes HTML parsing in Java incredibly easy.
Upvotes: 1