Reputation: 33
I am confronting "unacceptable character #x0095: special characters are not allowed in "", position 25" error message when transferring YAML format to Python dictionary object.
What would be the possible solution?
d = 'tended (Journaled)"\n - "\x95 Support plug and play"\n'
a = yaml.load(d)
The string to be transferred is abridged, not a proper YAML format, but I guess it's irrelevant in this case. I'm using Python3
Upvotes: 2
Views: 17010
Reputation: 76872
The YAML specification clearly states that a YAML stream only uses the printable subset of the Unicode character set. Except for NEL (\x85
), characters in the C1 control block is not allowed (i.e. characters \x80
-\x9F
).
This is almost valid YAML:
d = 'tended (Journaled)"\n - " Support plug and play"\n'
You just need a "
in front of it and :
after the key:
d = '"tended (Journaled)":\n - " Support plug and play"\n'
(although I'm not sure if Journaled is correct English)
The following is not YAML:
d = '"tended (Journaled)":\n - "\x95 Support plug and play"\n'
because \x95
is in the C1 control block. You will have to replace those characters by hand, or drop them.
There is not much in ruamel.yaml
that helps you convert such illegal characters, but you can use the Reader
's illegal character regex to scan for the illegal characters and drop them:
from ruamel.yaml import YAML
from ruamel.yaml.reader import Reader
yaml = YAML(typ='safe')
def strip_invalid(s):
res = ''
for x in s:
if Reader.NON_PRINTABLE.match(x):
# res += '\\x{:x}'.format(ord(x))
continue
res += x
return res
d = '"tended (Journaled)":\n - "\x95 Support plug and play"\n'
print(yaml.load(strip_invalid(d)))
which gives:
{'tended (Journaled)': [' Support plug and play']}
without any further manual intervention.
If you uncomment the line
# res += '\\x{:x}'.format(ord(x))
you get as output:
{'tended (Journaled)': ['\x95 Support plug and play']}
Upvotes: 5
Reputation: 1169
You have to check the messy data for invalid characters. Fortunately, the YAML reader has an Exception that yields the necessary data:
import yaml
try:
d = 'tended (Journaled)"\n - "\x95 Support plug and play"\n'
a = yaml.load(d)
except yaml.YAMLError as e:
print("Parsing YAML string failed")
print("Reason:", e.reason)
print("At position: {0} with encoding {1}".format(e.position, e.encoding))
print("Invalid char code:", e.character)
If you run this code, it displays exactly that your character \x95
is the culprit. Now you have to replace/repair/ask the user until no exception is thrown.
Upvotes: 1