Reputation: 245
I have a service which takes the user supplied rich text (can have HTML tags) and saves it into the database. That data gets used by some other application. But sometimes the user supplied data has missing HTML tags and wrong closing tags. I want to validate if the user supplied data is valid HTML or not and depending on that I want to warn the user.
Are there any java libraries to do HTML validation?
Upvotes: 5
Views: 3180
Reputation: 8062
You can use Jsoup, from the project README
Here is an example:
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
...
String markup = "<body><head>...";
Jsoup.isValid(markup, null);
Instead of null
, you can pass a Whitelist
? object as second parameter to the isValid
method.
Plus, you can easily install this library using Gradle
Upvotes: 3
Reputation: 310919
There's a great thing called NekoHTML which is just a thin wrapper over the Apache Xerces parser that turns on error-recovery/correction. It doesn't validate so much as error-correct, so you can process the result as XML, i.e. run it through XPaths or XSLTs. It has worked flawlessly for me for several months on completely arbitrary HTML from 3rd-party sites.
Upvotes: 0
Reputation: 35961
You can try JTidy, but it's too slow for simple HTML cleaning.
If you want just process HTML you can try NekoHTML, it's lightweight and fast
Upvotes: 3