Reputation: 19344
Got a lovely script that is printing out a bunch of text in raw unicode to handle all the different language.
the script works fine in ascii carater and non latin based languages (Hindi, Chinese etc.)
However it failes to print out the raw values for characters such as "é" "è"...
instead of printing the raw unicode value \u00E9 in print "é" in the file which in turn displays a diamond interrogation mark on the webpage.
f = codecs.open(newFilePathAndName(path,filename,language),encoding='raw_unicode_escape', mode='w')
...
f.write(outputString)
when I do a "print" in my script it displays the caracters é as \xe9
any ideas ?
the only that pops to mind is to put a regex that replace \xe by \u00
Upvotes: 0
Views: 3344
Reputation: 1123400
The raw_unicode_escape
encoding indeed does not provide escapes for values below 0xFF; these values are not normally escaped in a raw python unicode literal.
Use the unicode_escape
encoding instead:
>>> print u'\u00e9'.encode('unicode_escape')
\xe9
Upvotes: 2