chetu
chetu

Reputation: 245

Are there any java libraries for validating user supplied HTML, on the server side?

I have a service which takes the user supplied rich text (can have HTML tags) and saves it into the database. That data gets used by some other application. But sometimes the user supplied data has missing HTML tags and wrong closing tags. I want to validate if the user supplied data is valid HTML or not and depending on that I want to warn the user.

Are there any java libraries to do HTML validation?

Upvotes: 5

Views: 3180

Answers (5)

rodrigoalvesvieira
rodrigoalvesvieira

Reputation: 8062

You can use Jsoup, from the project README

Here is an example:

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
...
String markup = "<body><head>...";
Jsoup.isValid(markup, null);

Instead of null, you can pass a Whitelist ? object as second parameter to the isValid method.

Plus, you can easily install this library using Gradle

Upvotes: 3

user207421
user207421

Reputation: 310919

There's a great thing called NekoHTML which is just a thin wrapper over the Apache Xerces parser that turns on error-recovery/correction. It doesn't validate so much as error-correct, so you can process the result as XML, i.e. run it through XPaths or XSLTs. It has worked flawlessly for me for several months on completely arbitrary HTML from 3rd-party sites.

Upvotes: 0

Ms2ger
Ms2ger

Reputation: 15983

Validator.nu, which implements the HTML5 spec, IMO.

Upvotes: 1

Igor Artamonov
Igor Artamonov

Reputation: 35961

You can try JTidy, but it's too slow for simple HTML cleaning.

If you want just process HTML you can try NekoHTML, it's lightweight and fast

Upvotes: 3

Desintegr
Desintegr

Reputation: 7090

You can try JTidy.

JTidy is a Java port of HTML Tidy, a HTML syntax checker and pretty printer.

Upvotes: 3

Related Questions