Fai
Fai

Reputation: 354

Storing the value NUL (ASCII) in XML

Is it possible to save the ASCII NUL character in XML like this <data>*NUL**NUL**NUL*</data>?

I know I can display this value in Java using System.out.println("\0") and I wonder if XML can handle this value.

*My objective is to get "\0\0\0" from XML using Java

Thank you in advance!

Upvotes: 5

Views: 7278

Answers (4)

Roland
Roland

Reputation: 7853

NUL(U+0000) is not allowed in XML 1.0 and 1.1.

Wikipedia: Valid characters in XML

Note that the code point U+0000, assigned to the null control character, is the only character encoded in Unicode and ISO/IEC 10646 that is always invalid in any XML 1.0 and 1.1 document.

Upvotes: 3

Joop Eggen
Joop Eggen

Reputation: 109532

By the specs for 1.0 it would not be allowed officially.

The ASCII NUL aka '\0' aka \u0000 is a normal character in java. In C/C++ however it is used as a string terminator. So when C software would process XML it probably would detect the end of the XML text way too early.

For this java also has a solution, namely when XML is written in the UTF-8 encoding Unicode values > 127 are encoded in a multibyte sequence with 8th bit 1. DataOutputStream.writeUTF8 writes the '\0` also as multi-byte sequence. So it is read normally, and the decoding works.

  • This is not entirely strict UTF-8 that requires the shortest encoding.
  • I am still unsure about errors in C of processing the XML DOM.

So it is not a good idea.

Also mind, binary data should be converted to Base64 ASCII instead. As UTF-8 is not suited for binary data.

Upvotes: 3

JojOatXGME
JojOatXGME

Reputation: 3296

I have not read the standard of XML but since ElementTree of Python complains that it is not a valid XML-character, I think it is not supported by XML. You could implement an escape mechanism and represent "\0" with "\\0". Another possibility is the use the common Base64 encoding.

In Java, it may look like this:

// write data to element
String data = ...
element.setText(Base64.getEncoder().encodeToString(data.getBytes("UTF-8")))

// read data from element
String data = new String(Base64.getDecoder().decode(element.getText())), "UTF-8")

Upvotes: 2

dbasnett
dbasnett

Reputation: 11773

These are the possibilities for what data might look like,

              <row>
                  <data>actual data</data>
              </row>
              <row>
                  <!--null using attr. n ="t"-->
                  <data n="t"></data>
              </row>
              <row>
                  <!--some other meaning-->
                  <data/>
              </row>

edit: If you want to represent multiple nulls take the attribute route and change the attribute to represent how many nulls.

              <row>
                  <!--null using attr. n ="3"-->
                  <data n="3"></data>
              </row>

which is three nulls in the example.

edit: This is valid XML

              <row>
                  <data>\0</data>
              </row>

Your XML processor may not like it, but there is nothing wrong with it.

Upvotes: 2

Related Questions