Mark Peschel
Mark Peschel

Reputation: 334

What Java API data structure is good for HTML trees?

For fun, I'm writing a basic parser that finds data within an HTML document. I want to find the best structure to represent the branches of the parsed file. The criteria for "best structure" is this: I want to easily search for a tag's relative location and access its contents, like "the image in the second image tag after the third h3 tag in the body" or "the title tag in the header".

I expect to search the first level of tags for the tag I'm looking for, then move into the branch associated with that tag. That's the structure this question is looking for, but if there is a better way to find relative locations in an HTML document, please explain.

So that's the question. More generally, what kind of Java structures are available through the API that can represent tree data structures?

Upvotes: 0

Views: 225

Answers (1)

Nicolas Filotto
Nicolas Filotto

Reputation: 44995

Don't reinvent the wheel, just use an HTML parser like Jsoup, you will be able to get your tags thanks to a CSS selector using the method Element#select(cssQuery).

Document doc = Jsoup.parse(file, encoding);
Elements elements = doc.select(cssQuery);

Upvotes: 1

Related Questions