Jason Rogers
Jason Rogers

Reputation: 19344

Python: write unicode value in file

I'm writing a script that writes files in multiple languages including none ascii text therefore I'm writing the content in unicode.

Here is a print of the data:

[['LATEST', u'\u0928\u0935\u0940\u0928\u0924\u092e'], ['RECOMMENDED', u'\u0938\u093f\u092b\u093e\u0930\u093f\u0936 \u0915\u093f\u092f\u093e \u0917\u092f\u093e']]

here is the code that I use to write :

f = codecs.open(file,encoding='utf-8', mode='w')
f.write(el)

This works fine to have to text appear in Hindi in an text editor but because of the file format expected by the server, I need to directly write out

\u0928\u0935\u0940\u0928\u0924\u092e

I'm currently running

os.system("native2ascii -encoding utf-8 ./output/nls_hi.properties ./output/nls_hi.properties")

but this takes too much time and I can't help but think that there must be a way to directly write it the right way.

Ideas ?

Thanks

Jason

Upvotes: 1

Views: 1908

Answers (2)

user711413
user711413

Reputation: 777

You probably want to use something like my_string.encode('raw_unicode_escape')

Well, f = codecs.open('bla.txt', encoding='raw_unicode_escape', mode='w')

Then the file will contain the escaped string: \u0928\u0935\u0940\u0928\u0924\u092e

Upvotes: 4

Adam Rosenfield
Adam Rosenfield

Reputation: 400622

What file format does the server expect? Does it need a byte-order mark (BOM)? Whatever the answer, it's easiest to just directly use str.encode:

data = u'text with Unicode chars etc.'
with open(filename, 'w') as f:
    # For UTF-8, no BOM:
    f.write(data.encode('utf-8')

For UTF-16, use data.encode('utf-16'), which will come with a BOM. If you don't want a BOM, explicitly use either utf-16le (little-endian) or utf-16be (big-endian).

Upvotes: 1

Related Questions