stofl
stofl

Reputation: 2982

Extending the XHTML DTD to use special chars in ID attributes

I want to validate XML templates that are a XHTML extension. Now there are special characters like { and | in ID attributes. Is it possible to extend the XHTML DTD to overwrite the restriction to the characters allowed in the ID attribute? Or are the characters defined by the XML specification?

Upvotes: 4

Views: 300

Answers (1)

Ray Toal
Ray Toal

Reputation: 88378

You cannot use the characters '{' and '|' directly in id attributes because in the XML specification it says

Values of type ID must match the Name production. A name must not appear more than once in an XML document as a value of this type; i.e., ID values must uniquely identify the elements which bear them.

The name production is here. If you expand the syntax rule you see that the only characters allowed in a name are given by these productions:

[4] NameStartChar ::= ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]

[4a] NameChar ::= NameStartChar | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]

Unfortunately the left brace and the pipe are not allowed. The codepoints for those characters are #7B and #7C respectively; not in the accepted character ranges.

TL;DR: the legal characters for ID attributes are owned by the XML spec and your two characters are not legal.

ADDENDUM

Here are some examples. The following document passes validation for XHTML on the W3C validation site:

<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US" xml:lang="en-US">
  <head>
    <title>A title</title>
  </head>
  <body id="anid">
  </body>
</html>

but the following will not

<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US" xml:lang="en-US">
  <head>
    <title>A title</title>
  </head>
  <body id="ani{d">
  </body>
</html>

We get the error:

Line 8, Column 16: character "{" is not allowed in the value of attribute "id"

Now it's rather interesting that if you really want the left curly bracket in the id name, you can try this:

<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US" xml:lang="en-US">
  <head>
    <title>A title</title>
  </head>
  <body id="ani&#x7B;d">
  </body>
</html>

But you get the same error! You might want to try this; the validator shows the line with the ampersand hash x seven b semicolon but it thinks there is a left brace there.

The bottom line is that you simply cannot have ids with characters other than those allowed by the XML specification.

Upvotes: 5

Related Questions