Peter Alexander
Peter Alexander

Reputation: 54300

XML Attribute Value Normalization

I'm reading the XML specification at W3C, and this part of the section on attribute value normalization caught my attention:

If the attribute type is not CDATA, then the XML processor MUST further process the normalized attribute value by discarding any leading and trailing space (#x20) characters, and by replacing sequences of space (#x20) characters by a single space (#x20) character.

Does this mean that

<tag attr=" a      b " />

is equivalent to

<tag attr="a b" />

Or am I misinterpreting what the specification says?

Upvotes: 3

Views: 1895

Answers (2)

Daniel Haley
Daniel Haley

Reputation: 52888

Here's an example to supplement the correct answer by @Per Norrman (+1) and the example you used in your question.

<!DOCTYPE tag [
<!ELEMENT tag EMPTY>
<!ATTLIST tag
          attr NMTOKENS #IMPLIED>
]>
<tag attr=" a      b "/>

is equivalent to

<!DOCTYPE tag [
<!ELEMENT tag EMPTY>
<!ATTLIST tag
          attr NMTOKENS #IMPLIED>
]>
<tag attr="a b"/>

because the attribute type of attr is NMTOKENS (plural).

However the following would not be equivalent to the NMTOKEN example because attr is literal text (CDATA = character data):

<!DOCTYPE tag [
<!ELEMENT tag EMPTY>
<!ATTLIST tag
          attr CDATA #IMPLIED>
]>
<tag attr=" a      b "/>

This is because the attribute type of attr is CDATA.

Upvotes: 2

forty-two
forty-two

Reputation: 12817

Your interpretation is correct, given that the 'attr' type is not CDATA, but most probably it is.

The annotated XML specification helped me a lot when scrutinizing the details: http://www.xml.com/axml/testaxml.htm

Upvotes: 4

Related Questions