Jason Rogers
Jason Rogers

Reputation: 19344

Python: raw_unicode_escape doesn't write raw value for "é"

Got a lovely script that is printing out a bunch of text in raw unicode to handle all the different language.

the script works fine in ascii carater and non latin based languages (Hindi, Chinese etc.)

However it failes to print out the raw values for characters such as "é" "è"...

instead of printing the raw unicode value \u00E9 in print "é" in the file which in turn displays a diamond interrogation mark on the webpage.

f = codecs.open(newFilePathAndName(path,filename,language),encoding='raw_unicode_escape', mode='w')
...
f.write(outputString)

when I do a "print" in my script it displays the caracters é as \xe9

any ideas ?

the only that pops to mind is to put a regex that replace \xe by \u00

Upvotes: 0

Views: 3344

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1123400

The raw_unicode_escape encoding indeed does not provide escapes for values below 0xFF; these values are not normally escaped in a raw python unicode literal.

Use the unicode_escape encoding instead:

>>> print u'\u00e9'.encode('unicode_escape')
\xe9

Upvotes: 2

Related Questions