ScottProuty
ScottProuty

Reputation: 203

Well formed XML? XML references within tag names and attribute names

I have been searching trying to confirm my reading of the XML spec. My interpretation is that pre-defined entities and numeric character references are not allowed in tag names and attribute names, for example this is not allowed by the XML 1.0 spec.:

<root>
<test&apos;&#x27;&#39;tag test&apos;&#x27;&#39;attribute="one"/>
</root>

However, I have one parser that returns test'''tag for the tag name and test'''attribute for the attribute name while another parser returns test&apos;&#x27;&#39;tag for the tag name and test&apos;&#x27;&#39;attribute for the attribute name.

Which parser is correct? Or are they both wrong (i.e. they should throw a well formed error)?

Thanks!

Upvotes: 4

Views: 1258

Answers (3)

StaxMan
StaxMan

Reputation: 116590

This is very simple: no entities can be used within names. Both "parsers" are wrong here. XML specification quite clearly defines this -- there are no hidden default rules; if some construct is not included, it is not allowed.

Entities can only be used within regular character content and attribute values. And they can be included in some other places (comments, processing instructions, DTD subsets) but won't be expanded (i.e. are not recognized as entities).

Upvotes: 2

dommer
dommer

Reputation: 19820

It seems to me that they are both wrong. According to the spec, only the following characters should be in a start tag:

":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF] | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]

XMLSpy certainly isn't happy with it either. Nor <Oxygen/>.

And...just for good measure...here's what .NET had to say about it:

The '&' character, hexadecimal value 0x26, cannot be included in a name. Line 1, position 12.

What parsers are you using?

Upvotes: 2

17 of 26
17 of 26

Reputation: 27382

In digging around at w3.org, I found the following relevant pieces:

[41] Attribute ::= Name Eq AttValue [VC: Attribute Value Type] [WFC: No External Entity References] [WFC: No < in Attribute Values]

[WFC: No External Entity References] links to:

Well-formedness constraint: No External Entity References
Attribute values MUST NOT contain direct or indirect entity references to external entities.

Name links to:

[5] Name ::= NameStartChar (NameChar)*

[4] NameStartChar ::= ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]

[4a] NameChar ::= NameStartChar | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]

Yes, it's as clear as mud! My interpretation of this would be that you could use hex entity references as long as they fell in the ranges specified above but that you could not use pre-defined references.

I would expect a well-formed error when the names don't conform to this.

Upvotes: 0

Related Questions