newguy
newguy

Reputation: 5976

How to validate that HTML matches W3C standards

I have a project that generates HTML pages using a Velocity template and Java. But most of the pages do not comply with W3C standards. How can I validate those HTML pages and get a log telling me what errors/warnings on what pages?

Then I can fix the errors manually. I have tried JTidyFilter, but that doesn't work for me.

Upvotes: 5

Views: 5500

Answers (4)

Wolfgang Fahl
Wolfgang Fahl

Reputation: 15769

The official API at

allows to call a local or remote W3C checker via the Markup Validator Web Service API since 2007.

has a single Java class solution using Jersey and moxy-Jaxb to read in the SOAP response.

This is the Maven dependency to use it:

<dependency>
  <groupId>com.bitplan</groupId>
  <artifactId>w3cValidator</artifactId>
  <version>0.0.2</version>
 </dependency>

Here is a JUnit test for trying it:

/**
 * The URL of the official W3C markup validation service.
 * If you'd like to run the tests against your own installation you might want to modify this.
 */
public static final String url = "http://validator.w3.org/check";

/**
 * Test the w3cValidator interface with some HTML code
 * @throws Exception
 */
@Test
public void testW3CValidator() throws Exception {

    String preamble =
            "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\"\n" +
            "   \"http://www.w3.org/TR/html4/loose.dtd\">\n" +
            "<html>\n" +
            "  <head>\n" +
            "    <meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\">\n" +
            "    <title>test</title>\n" +
            "  </head>\n" +
            "  <body>\n";

    String footer = "  </body>\n" +
            "</html>\n";

    String[] htmls = {
            preamble +
            "    <div>\n" +
            footer,
            "<!DOCTYPE html><html><head><title>test W3CChecker</title></head><body><div></body></html>"
    };
    int[] expectedErrs = {1, 2};
    int[] expectedWarnings = {1, 2};
    int index = 0;
    System.out.println("Testing " + htmls.length + " html messages via " + url);
    for (String html : htmls) {
        W3CValidator checkResult = W3CValidator.check(url, html);
        List<ValidationError> errlist = checkResult.body.response.errors.errorlist;
        List<ValidationWarning> warnlist = checkResult.body.response.warnings.warninglist;
        Object first = errlist.get(0);
        assertTrue("if first is a string, than moxy is not activated",
                   first instanceof ValidationError);
        //System.out.println(first.getClass().getName());
        //System.out.println(first);
        System.out.println("Validation result for test " + (index+1) + ":");
        for (ValidationError err:errlist) {
            System.out.println("\t" + err.toString());
        }
        for (ValidationWarning warn:warnlist) {
            System.out.println("\t" + warn.toString());
        }
        System.out.println();
        assertTrue(errlist.size() >= expectedErrs[index]);
        assertTrue(warnlist.size() >= expectedWarnings[index]);
        index++;
    }
} // testW3CValidator

shows how to run your on W3C validator on an Ubuntu Linux system.

Upvotes: 0

yegor256
yegor256

Reputation: 105083

You can use the W3C validator directly from Java, see w3c-jabi.

Upvotes: 5

newguy
newguy

Reputation: 5976

After extensive research and a little bit code hack, I've managed to use JTidyFilter in my project, and it is working beautifully now. JTidyFilter is in JTidyServlet which is a sub-project of JTidy written about five years ago. Recently they've updated the codes to comply with Java 5 compiler. I downloaded their codes, upgraded some dependencies and most importantly, changed some lines in the JTidyFilter class which handles the filter and finally got it work nicely in my project.

There are still some issues in reformatting the HTML, because I can see one or two errors when I use the Firefox HTML validation plugin, but otherwise most pages pass the validation.

Upvotes: 1

Robert Hui
Robert Hui

Reputation: 744

There is also an experimental API available from W3C to help automate validation. They kindly ask that you throttle requests, and also offer instructions on setting up a validator on a local server. It's definitely more work, but if you're generating a lot of HTML pages, it would probably make sense to also automate the validation.

http://validator.w3.org/docs/api.html

Upvotes: 2

Related Questions