Reputation: 19344
I'm writing a script that writes files in multiple languages including none ascii text therefore I'm writing the content in unicode.
Here is a print of the data:
[['LATEST', u'\u0928\u0935\u0940\u0928\u0924\u092e'], ['RECOMMENDED', u'\u0938\u093f\u092b\u093e\u0930\u093f\u0936 \u0915\u093f\u092f\u093e \u0917\u092f\u093e']]
here is the code that I use to write :
f = codecs.open(file,encoding='utf-8', mode='w')
f.write(el)
This works fine to have to text appear in Hindi in an text editor but because of the file format expected by the server, I need to directly write out
\u0928\u0935\u0940\u0928\u0924\u092e
I'm currently running
os.system("native2ascii -encoding utf-8 ./output/nls_hi.properties ./output/nls_hi.properties")
but this takes too much time and I can't help but think that there must be a way to directly write it the right way.
Ideas ?
Thanks
Jason
Upvotes: 1
Views: 1908
Reputation: 777
You probably want to use something like my_string.encode('raw_unicode_escape')
Well, f = codecs.open('bla.txt', encoding='raw_unicode_escape', mode='w')
Then the file will contain the escaped string: \u0928\u0935\u0940\u0928\u0924\u092e
Upvotes: 4
Reputation: 400622
What file format does the server expect? Does it need a byte-order mark (BOM)? Whatever the answer, it's easiest to just directly use str.encode
:
data = u'text with Unicode chars etc.'
with open(filename, 'w') as f:
# For UTF-8, no BOM:
f.write(data.encode('utf-8')
For UTF-16, use data.encode('utf-16')
, which will come with a BOM. If you don't want a BOM, explicitly use either utf-16le
(little-endian) or utf-16be
(big-endian).
Upvotes: 1