Reputation: 113
Will something "break" if I use numeric entities instead of the usual recommended alpha entities for reserved chars in XML?
This is part of a rather complex app that allows users to enter bibliographic metadata via XML, CSV or web-based forms. This data can then be extracted in XML (using the ONIX standard) with user-chosen encodings: utf-8, win-1252, etc.
The original programmers (long gone now...) decided to use numeric entities for all chars that cannot be represented in the chosen encoding. XML-reserved chars are considered as non-representable under any encoding. They are given the same treatment and are encoded using numeric entities.
Some users have complained about &, <, >, etc. being encoded as &, etc. instead of using the usual alpha codes and I'd like to know if these complaints have any substance.
If I can avoid digging through the legacy code to change this behaviour, it would save me a lot of resources.
Upvotes: 3
Views: 6679
Reputation: 52888
Yes, it's fine to escape using numeric character references.
From the spec (emphasis mine):
The ampersand character (&) and the left angle bracket (<) must not appear in their literal form, except when used as markup delimiters, or within a comment, a processing instruction, or a CDATA section. If they are needed elsewhere, they must be escaped using either numeric character references or the strings "
&
" and "<
" respectively. The right angle bracket (>) may be represented using the string ">
", and must, for compatibility, be escaped using either ">
" or a character reference when it appears in the string "]]>
" in content, when that string is not marking the end of a CDATA section.
You could also use a hex entity reference...
&
= &
= &
<
= <
= <
>
= >
= >
Upvotes: 6