chai
chai

Reputation: 1483

How to parse and modify HTML file in Java

I am doing a project wherein I need to read an HTML file and identify specific tags, modify the contents of the tag, and create a new HTML file. Is there a library that parses HTML tags and is capable of writing the tags back to a new file?

Upvotes: 8

Views: 10913

Answers (4)

ivy
ivy

Reputation: 5559

There are too many HTML parsers. You could use JTidy, NekoHTML or check TagSoup.

I usually prefer parsing XHTML with the standard Java XML Parsers, but you can't do this for any type of HTML.

Upvotes: 2

Victor Ionescu
Victor Ionescu

Reputation: 2019

Check out http://jsoup.org, it has a friendly dom-like API, for simple tasks you don't need to parse the html.

Upvotes: 7

Igor Konoplyanko
Igor Konoplyanko

Reputation: 9374


if you want to modify web page and return modified content, I thnk the best way is to use XSL transformation.
http://en.wikipedia.org/wiki/XSLT

Upvotes: 2

Matt Phillips
Matt Phillips

Reputation: 11529

Look at http://java-source.net/open-source/html-parsers for a list of java libraries that parse html files into java objects that can be manipulated.

If the html files you are working with are well formed (xhtml) then you can also use XML libraries in java to find particular tags and modify them. The IO itself should be handled by the particular libraries you are using.

If you choose to manually parse the strings you could use regular expressions to find particular tags and use the java io libraries to write to the files and create new html documents. But this method reinvents the wheel so to speak because you have to manage tag opening and closing and all of those things are handled by pre-existing libraries.

Upvotes: 0

Related Questions