user166390
user166390

Reputation:

What advantage does requiring < (<) to be encoded in an XML attribute value provide?

Why must < be encoded in an XML attribute value while > does not require this encoding?

Obviously, it is "because the specification says so" ..

.. however, I am looking for a simple, practical, example or reason for why this restriction exists, from a parsing viewpoint, when > does not need to be escaped in this context.

I cannot think of any but, without trying to determine the original rationale (unless it is documented), would like to be able to see a case for what parsing/syntax complications this restriction removes or what adding this restriction enables - Is it used/useful for "seeking"? Does it enable unification of attribute values and text content parsing? Other?

Upvotes: 0

Views: 119

Answers (2)

Michael Kay
Michael Kay

Reputation: 163262

There are two ways of answering a "why" question:

(a) can you think of any purpose that is served by having this rule?

(b) as a matter of historical fact, do you know why members of the working group voted the way they did, if indeed the question ever came up for a formal decision?

The historical approach (b) is very difficult; even those who were present at a meeting where the decision was made sometimes have difficulty in knowing why the committee decided the way it did, and working it out from the minutes is usually impossible. It might have been late in the afternoon; they might have become impatient with the person who was proposing the change, etc.

Usually, though, oddities in the XML spec can be traced to its origins in SGML. The XML Working Group were anxious to ensure that nothing was allowed in XML that wasn't allowed in SGML, and SGML imposed all sorts of restrictions as a consequence of its syntactic flexibility, for example allowing attributes without delimiting quotes. I'm no SGML expert so I can't be more precise than that, but I would be 90% certain this is the explanation.

Upvotes: 1

hobbs
hobbs

Reputation: 239652

It could be a self-synchronizing thing; if you guarantee that "<" doesn't occur in the interior of an XML document except in its role as a tag starter, then you can start in the middle of a document, skip characters until you see a "<", and then begin parsing, confident that you're at the beginning of a tag. Naïvely this doesn't seem to be all that useful — parsing XML beginning from the middle doesn't make a lot of sense — but maybe it has implications for error recovery. There's no such reason to worry about ">" in the same way, as long as "<" is protected.

Upvotes: 1

Related Questions