klapshin
klapshin

Reputation: 801

ruamel.yaml ReaderError when trying to load special characters (Non-whitespace controls)

Trying to load the big-list-of-naughty-strings into Python using ruamel.yaml (to test the character set for an API).

Everything was loaded fine except the 2 lines: 115 and 120 (94 and 95 in the json version).

In comments they were described as 'Non-whitespace C0 controls' and 'Non-whitespace C1 controls'.

Example:

>>> from ruamel.yaml import YAML
>>> ruamel_yaml = YAML()
>>> ruamel_yaml.load('\u000f')
...
ruamel.yaml.reader.ReaderError: unacceptable character #x000f: special characters are not allowed
  in "<unicode string>", position 0

Wondering if this could be a bug or an expected behavior.

Upvotes: 1

Views: 825

Answers (1)

Anthon
Anthon

Reputation: 76568

It looks like you have not consulted the character set chapter in the YAML specification:

The allowed character range explicitly excludes the C0 control block #x0-#x1F (except for TAB #x9, LF #xA, and CR #xD which are allowed), DEL #x7F, the C1 control block #x80-#x9F (except for NEL #x85 which is allowed), the surrogate block #xD800-#xDFFF, #xFFFE, and #xFFFF.

Line 115 are the C0 controls, and line 120 the C1 controls, so it should be no surprise that those lines don't load.

Upvotes: 1

Related Questions