Reputation: 25362
The code below explains my problem. It works as expected in Python 2.7, yet, all of the encode calls I've tried fail in Python 3.5 (see exception below)... does anyone on a way to circumvent this error and make it work in Python 3.5 as it did on Python 2.7?
import sys
if sys.version_info[0] <= 2:
chr = unichr
out = chr(0xdfff)
print(repr(out)) # outputs '\udfff' both in Python 2 and 3
assert out.encode('utf-8').decode('utf-8') == out
assert out.encode('utf-8', errors='surrogateescape').decode('utf-8') == out
assert out.encode('utf-8', errors='strict').decode('utf-8') == out
Error in Python 3.5:
Traceback (most recent call last):
File "W:\rocky40\Projects\etk\coilib50\source\python\coilib50\io\xmlpickle\snippet.py", line 8, in <module>
assert out.encode('utf-8').decode('utf-8') == out
UnicodeEncodeError: 'utf-8' codec can't encode character '\udfff' in position 0: surrogates not allowed
Note that a different encoding wouldn't really suit it as I have files written this way to disk in Python 2 and I need to be able to load it back and dump it again on Python 3 so that Python 2 can read it again (so, the actual bytes written shouldn't really change).
Upvotes: 2
Views: 1872
Reputation: 25362
After searching a bit more I noticed that https://docs.python.org/3/library/codecs.html#codec-base-classes points to a surrogatepass
which is specific to utf-X
codecs, so, using surrogatepass
instead of surrogateescape
does seem to get the trick done and works properly on Python 3:
assert out.encode('utf-8', errors='surrogatepass'
).decode('utf-8', errors='surrogatepass') == out
Upvotes: 2
Reputation: 16214
The problem is that char belongs to utf-16:
import sys
if sys.version_info[0] <= 2:
chr = unichr
out = chr(0xdfff)
print(out.encode('utf-16-le', 'ignore').decode('utf-16-le', 'ignore') == out)
print(out.encode('utf-16-le', 'ignore').decode('utf-16-le', 'ignore') == out)
print(out.encode('utf-16-le', 'ignore').decode('utf-16-le', 'ignore') == out)
That compiles and works, but... acording to this answer, you will have problems with surrogates
Upvotes: 0