Reputation: 801
Trying to load the big-list-of-naughty-strings into Python using ruamel.yaml (to test the character set for an API).
Everything was loaded fine except the 2 lines: 115 and 120 (94 and 95 in the json version).
In comments they were described as 'Non-whitespace C0 controls' and 'Non-whitespace C1 controls'.
Example:
>>> from ruamel.yaml import YAML
>>> ruamel_yaml = YAML()
>>> ruamel_yaml.load('\u000f')
...
ruamel.yaml.reader.ReaderError: unacceptable character #x000f: special characters are not allowed
in "<unicode string>", position 0
Wondering if this could be a bug or an expected behavior.
Upvotes: 1
Views: 825
Reputation: 76568
It looks like you have not consulted the character set chapter in the YAML specification:
The allowed character range explicitly excludes the C0 control block #x0-#x1F (except for TAB #x9, LF #xA, and CR #xD which are allowed), DEL #x7F, the C1 control block #x80-#x9F (except for NEL #x85 which is allowed), the surrogate block #xD800-#xDFFF, #xFFFE, and #xFFFF.
Line 115 are the C0 controls, and line 120 the C1 controls, so it should be no surprise that those lines don't load.
Upvotes: 1