Reputation: 5976
I have a project that generates HTML pages using a Velocity template and Java. But most of the pages do not comply with W3C standards. How can I validate those HTML pages and get a log telling me what errors/warnings on what pages?
Then I can fix the errors manually. I have tried JTidyFilter, but that doesn't work for me.
Upvotes: 5
Views: 5500
Reputation: 15769
The official API at
allows to call a local or remote W3C checker via the Markup Validator Web Service API since 2007.
has a single Java class solution using Jersey and moxy-Jaxb to read in the SOAP response.
This is the Maven dependency to use it:
<dependency>
<groupId>com.bitplan</groupId>
<artifactId>w3cValidator</artifactId>
<version>0.0.2</version>
</dependency>
Here is a JUnit test for trying it:
/**
* The URL of the official W3C markup validation service.
* If you'd like to run the tests against your own installation you might want to modify this.
*/
public static final String url = "http://validator.w3.org/check";
/**
* Test the w3cValidator interface with some HTML code
* @throws Exception
*/
@Test
public void testW3CValidator() throws Exception {
String preamble =
"<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\"\n" +
" \"http://www.w3.org/TR/html4/loose.dtd\">\n" +
"<html>\n" +
" <head>\n" +
" <meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\">\n" +
" <title>test</title>\n" +
" </head>\n" +
" <body>\n";
String footer = " </body>\n" +
"</html>\n";
String[] htmls = {
preamble +
" <div>\n" +
footer,
"<!DOCTYPE html><html><head><title>test W3CChecker</title></head><body><div></body></html>"
};
int[] expectedErrs = {1, 2};
int[] expectedWarnings = {1, 2};
int index = 0;
System.out.println("Testing " + htmls.length + " html messages via " + url);
for (String html : htmls) {
W3CValidator checkResult = W3CValidator.check(url, html);
List<ValidationError> errlist = checkResult.body.response.errors.errorlist;
List<ValidationWarning> warnlist = checkResult.body.response.warnings.warninglist;
Object first = errlist.get(0);
assertTrue("if first is a string, than moxy is not activated",
first instanceof ValidationError);
//System.out.println(first.getClass().getName());
//System.out.println(first);
System.out.println("Validation result for test " + (index+1) + ":");
for (ValidationError err:errlist) {
System.out.println("\t" + err.toString());
}
for (ValidationWarning warn:warnlist) {
System.out.println("\t" + warn.toString());
}
System.out.println();
assertTrue(errlist.size() >= expectedErrs[index]);
assertTrue(warnlist.size() >= expectedWarnings[index]);
index++;
}
} // testW3CValidator
shows how to run your on W3C validator on an Ubuntu Linux system.
Upvotes: 0
Reputation: 105083
You can use the W3C validator directly from Java, see w3c-jabi.
Upvotes: 5
Reputation: 5976
After extensive research and a little bit code hack, I've managed to use JTidyFilter in my project, and it is working beautifully now. JTidyFilter is in JTidyServlet which is a sub-project of JTidy written about five years ago. Recently they've updated the codes to comply with Java 5 compiler. I downloaded their codes, upgraded some dependencies and most importantly, changed some lines in the JTidyFilter class which handles the filter and finally got it work nicely in my project.
There are still some issues in reformatting the HTML, because I can see one or two errors when I use the Firefox HTML validation plugin, but otherwise most pages pass the validation.
Upvotes: 1
Reputation: 744
There is also an experimental API available from W3C to help automate validation. They kindly ask that you throttle requests, and also offer instructions on setting up a validator on a local server. It's definitely more work, but if you're generating a lot of HTML pages, it would probably make sense to also automate the validation.
http://validator.w3.org/docs/api.html
Upvotes: 2