user576249
user576249

Reputation: 31

html search and replace preserving html tags

I'm looking for a Java based html parser which can search and replace text preserving html tags. This question has been asked here before but the answers seems to be not hitting the target. There are few html parsers which I downloaded and wrote simple programs to see whether they can do the job. These include jsoup, Jericho, Java HTML parser etc. These can do a search but when it comes to replacing text preserving html tags, there is no way to do it.

I have read the complete thread for these posts:

How to find/replace text in html while preserving html tags/structure

html search and replace on server side

If there are no such parser exists today, what is the best way for implementing one? If you have done something like this already, can you share the code?

Upvotes: 2

Views: 1149

Answers (2)

Mike Samuel
Mike Samuel

Reputation: 120516

The Caja parser uses libhtmlparser, an HTML5 parser that deals well with tag soup containing embedded XML subtrees producing an org.w3c.dom.DocumentFragment, and has a renderer that produces well formed HTML.

The parser code is at http://code.google.com/p/google-caja/source/browse/trunk/src/com/google/caja/parser/html/DomParser.java

The renderer code is at http://code.google.com/p/google-caja/source/browse/trunk/src/com/google/caja/parser/html/Nodes.java

Upvotes: 1

Jochen Bedersdorfer
Jochen Bedersdorfer

Reputation: 4122

The Jericho parser might help you. Has been around forever and works with malformed HTML. http://jericho.htmlparser.net/docs/index.html

Upvotes: 1

Related Questions