Reputation: 9893
I want to make sure all string are unicode in my code, so I use unicode_literals
, then I need to write string to file:
from __future__ import unicode_literals
with open('/tmp/test', 'wb') as f:
f.write("中文") # UnicodeEncodeError
so I need to do this:
from __future__ import unicode_literals
with open('/tmp/test', 'wb') as f:
f.write("中文".encode("utf-8"))
f.write("中文".encode("utf-8"))
f.write("中文".encode("utf-8"))
f.write("中文".encode("utf-8"))
but every time I need to encode in code, I am lazy, so I change to codecs:
from __future__ import unicode_literals
from codecs import open
import locale, codecs
lang, encoding = locale.getdefaultlocale()
with open('/tmp/test', 'wb', encoding) as f:
f.write("中文")
still I think this is too much if I just want to write to file, any easier method?
Upvotes: 6
Views: 9503
Reputation: 414079
You don't need to call .encode()
and you don't need to call locale.getdefaultlocale()
explicitly:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import io
with io.open('/tmp/test', 'w') as file:
file.write(u"中文" * 4)
It uses locale.getpreferredencoding(False)
character encoding to save Unicode text to the file.
On Python 3:
you don't need to use the explicit encoding declaration (# -*- coding: utf-8 -*-
), to use literal non-ascii characters in your Python source code. utf-8
is the default.
you don't need to use import io
: builtin open()
is io.open()
there
u''
(u
prefix). ''
literals are Unicode by default. If you want to omit u''
then put back from __future__ import unicode_literals
as in your code in the question.i.e., the complete Python 3 code is:
#!/usr/bin/env python3
with open('/tmp/test', 'w') as file:
file.write("中文" * 4)
Upvotes: 4