user1877566
user1877566

Reputation: 1

w3validator API - testing XHTML1 DOCTYPE

I am building a newsletter builder in PHP and one of my requirements is that once the email has been composed in HTML it is checked to see if it meets the w3 standards and a notice is thrown to the end user if any invalid errors are found from the validation run.

At of the moment I am using the w3validator API via a PHP Curl request following this: https://github.com/validator/validator/wiki/Service:-Input:-POST-body

My problem is that I can't seem to get the validator to process the html content using the XHTML1 doctype. By default, it expects to see the HTML5 doctype, and although there is the ability to set a query string parameter ('parser'), it seems the minimum version I am able to test is HTML4.

I have also tried leaving the 'parser' parameter both blank and with the value 'html' which should have made the validator use the doctype set in the html content for its validation, but this doesn't work either.

Is it possible to use the w3standards api to valid XHTML1? And if not is there an alternative API that would allow for us to do so?

Upvotes: 0

Views: 189

Answers (1)

sideshowbarker
sideshowbarker

Reputation: 88235

Maintainer of the W3C HTML checker (validator) here.

To check documents against the XHTML1 schema, you need to send:

  • the schema query param with value http://s.validator.nu/xhtml10/xhtml-strict.rnc
  • a Content-Type header with value application/xhtml+xml; charset=utf-8

For example, using curl to send a request, it would look like this:

curl -H "Content-Type: application/xhtml+xml; charset=utf-8" \
--data-binary @FILE.xhtml \
'https://validator.w3.org/nu/?schema=http://s.validator.nu/xhtml10/xhtml-strict.rnc&out=json'

…where FILE.xhtml is replaced with whatever the name is of the actual file you want to check, and the out=json query param specifies that you want JSON-formatted results from the checker. (Use out=xml if you want XML-formatted results, or out=gnu for results in the GNU error format.)

http://s.validator.nu/xhtml10/xhtml-strict.rnc is just an identifier the checker recognizes internally for the XHTML 1.0 Strict schema. There’s no actual schema on the Web at that URL.

The list of such identifiers that the checker recognizes is in the following file:

https://github.com/validator/validator/blob/master/resources/presets.txt

Note that you can include some additional checks by adding other identifiers to the schema value:

curl -H "Content-Type: application/xhtml+xml; charset=utf-8" \
--data-binary @FILE.xhtml \
'https://validator.w3.org/nu/?schema=http://s.validator.nu/xhtml10/xhtml-strict.rnc%20http://s.validator.nu/html4/assertions.sch%20http://c.validator.nu/all-html4/&out=json'

The schema identifiers must be separated by %20 (percent-encoded space character).

Upvotes: 2

Related Questions