roger
roger

Reputation: 9893

python write unicode to file easily?

I want to make sure all string are unicode in my code, so I use unicode_literals, then I need to write string to file:

from __future__ import unicode_literals
with open('/tmp/test', 'wb') as f:
    f.write("中文") # UnicodeEncodeError

so I need to do this:

from __future__ import unicode_literals
with open('/tmp/test', 'wb') as f:
    f.write("中文".encode("utf-8"))
    f.write("中文".encode("utf-8"))
    f.write("中文".encode("utf-8"))
    f.write("中文".encode("utf-8"))

but every time I need to encode in code, I am lazy, so I change to codecs:

from __future__ import unicode_literals
from codecs import open
import locale, codecs
lang, encoding = locale.getdefaultlocale()

with open('/tmp/test', 'wb', encoding) as f:
    f.write("中文")

still I think this is too much if I just want to write to file, any easier method?

Upvotes: 6

Views: 9503

Answers (2)

jfs
jfs

Reputation: 414079

You don't need to call .encode() and you don't need to call locale.getdefaultlocale() explicitly:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import io

with io.open('/tmp/test', 'w') as file:
    file.write(u"中文" * 4)

It uses locale.getpreferredencoding(False) character encoding to save Unicode text to the file.

On Python 3:

  • you don't need to use the explicit encoding declaration (# -*- coding: utf-8 -*-), to use literal non-ascii characters in your Python source code. utf-8 is the default.

  • you don't need to use import io: builtin open() is io.open() there

  • you don't need to use u'' (u prefix). '' literals are Unicode by default. If you want to omit u'' then put back from __future__ import unicode_literals as in your code in the question.

i.e., the complete Python 3 code is:

#!/usr/bin/env python3

with open('/tmp/test', 'w') as file:
    file.write("中文" * 4)

Upvotes: 4

DougieHauser
DougieHauser

Reputation: 470

What about this solution?

Write to UTF-8 file in Python

Only three lines of code.

Upvotes: 0

Related Questions