Reputation: 34016
I hit a wall here. I need to redirect all output to a file but I need this file to be encoded in utf-8. Problem is that when using codecs.open
:
# errLog = io.open(os.path.join(os.getcwdu(),u'BashBugDump.log'), 'w',
# encoding='utf-8')
errLog = codecs.open(os.path.join(os.getcwdu(), u'BashBugDump.log'),
'w', encoding='utf-8')
sys.stdout = errLog
sys.stderr = errLog
codecs opens the file in binary mode resulting in \n
line terminators. I tried using io.open
but this does not play with the print statement used all over the codebase (see Python 2.7: print doesn't speak unicode to the io module? or python: TypeError: can't write str to text stream)
I am not the only one having this issue for instance see here but the solution they adopted is specific to the logging module we do not use.
See also this won't fix bug in python: https://bugs.python.org/issue2131
So what's the one right way for doing this in python2 ?
Upvotes: 4
Views: 2631
Reputation: 177554
Redirection is a shell operation. You don't have to change the Python code at all, but you do have to tell Python what encoding to use if redirected. That is done with an environment variable. The following code redirects both stdout and stderr to a UTF-8-encoded file:
set PYTHONIOENCODING=utf8
python test.py >out.txt 2>&1
#coding:utf8
import sys
print u"我不喜欢你女朋友!"
print >>sys.stderr, u"你需要一个新的。"
我不喜欢你女朋友!
你需要一个新的。
0000: E6 88 91 E4 B8 8D E5 96 9C E6 AC A2 E4 BD A0 E5
0010: A5 B3 E6 9C 8B E5 8F 8B EF BC 81 0D 0A E4 BD A0
0020: E9 9C 80 E8 A6 81 E4 B8 80 E4 B8 AA E6 96 B0 E7
0030: 9A 84 E3 80 82 0D 0A
Note: You do need to print Unicode strings for this to work. Print byte strings and you get the bytes you print.
codecs.open
may force binary mode, but codecs.getwriter
doesn't. Give it a file opened in text mode:
#coding:utf8
import sys
import codecs
sys.stdout = sys.stderr = codecs.getwriter('utf8')(open('out.txt','w'))
print u"我不喜欢你女朋友!"
print >>sys.stderr, u"你需要一个新的。"
(same output and hexdump as above)
Upvotes: 5
Reputation: 55469
It appears that the Python 2 version of io
doesn't play well with the print
statement, but it will work if you use the print
function.
Demo:
from __future__ import print_function
import sys
import io
errLog = io.open('test.log', mode='wt', buffering=1, encoding='utf-8', newline='\r\n')
sys.stdout = errLog
print(u'This is a ™ test')
print(u'Another © line')
contents of 'test.log'
This is a ™ test
Another © line
hexdump of 'test.log'
00000000 54 68 69 73 20 69 73 20 61 20 e2 84 a2 20 74 65 |This is a ... te|
00000010 73 74 0d 0a 41 6e 6f 74 68 65 72 20 c2 a9 20 6c |st..Another .. l|
00000020 69 6e 65 0d 0a |ine..|
00000025
I ran this code on Python 2.6 on Linux, YMMV.
If you don't want to use the print
function, you can implement your own file-like encoding class.
import sys
class Encoder(object):
def __init__(self, fname):
self.file = open(fname, 'wb')
def write(self, s):
self.file.write(s.replace('\n', '\r\n').encode('utf-8'))
errlog = Encoder('test.log')
sys.stdout = errlog
sys.stderr = errlog
print 'hello\nthere'
print >>sys.stderr, u'This is a ™ test'
print u'Another © line'
print >>sys.stderr, 1, 2, 3, 4
print 5, 6, 7, 8
contents of 'test.log'
hello
there
This is a ™ test
Another © line
1 2 3 4
5 6 7 8
hexdump of 'test.log'
00000000 68 65 6c 6c 6f 0d 0a 74 68 65 72 65 0d 0a 54 68 |hello..there..Th|
00000010 69 73 20 69 73 20 61 20 e2 84 a2 20 74 65 73 74 |is is a ... test|
00000020 0d 0a 41 6e 6f 74 68 65 72 20 c2 a9 20 6c 69 6e |..Another .. lin|
00000030 65 0d 0a 31 20 32 20 33 20 34 0d 0a 35 20 36 20 |e..1 2 3 4..5 6 |
00000040 37 20 38 0d 0a |7 8..|
00000045
Please bear in mind that this is just a quick demo. You may want a more sophisticated way to handle newlines, eg you probably don't want to replace \n
if it's already preceded by \r
. OTOH, with normal Python text handling that shouldn't be an issue...
Here's yet another version which combines the 2 previous strategies. I don't know if it's any faster than the second version.
import sys
import io
class Encoder(object):
def __init__(self, fname):
self.file = io.open(fname, mode='wt', encoding='utf-8', newline='\r\n')
def write(self, s):
self.file.write(unicode(s))
errlog = Encoder('test.log')
sys.stdout = errlog
sys.stderr = errlog
print 'hello\nthere'
print >>sys.stderr, u'This is a ™ test'
print u'Another © line'
print >>sys.stderr, 1, 2, 3, 4
print 5, 6, 7, 8
This produces the same output as the previous version.
Upvotes: 1