Reputation: 13456
I'm trying to left align an UTF-8 encoded string with string.ljust
. This exception is raised: UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 0: ordinal not in range(128)
. For example,
s = u"你好" // a Chinese string
stdout.write(s.encode("UTF-8").ljust(20))
Am I on the right track? Or I should use other approach to format?
Thanks and Best Regards.
Upvotes: 1
Views: 2671
Reputation: 178409
Did you post the exact code and the exact error you received? Because your code works without throwing an error on both a cp437
and utf-8
terminal. In any case you should justify the Unicode string before sending it to the terminal. Note the difference because the UTF-8-encoded Chinese has length 6 when encoded instead of length 2:
>>> sys.stdout.write(s.encode('utf-8').ljust(20) + "hello")
你好 hello
>>> sys.stdout.write(s.ljust(20).encode('utf-8') + "hello")
你好 hello
Note also that Chinese characters are wider than the other characters in typical fixed-width fonts so things may still not line up as you like if mixing languages (see this answer for a solution):
>>> sys.stdout.write("12".ljust(20) + "hello")
12 hello
Normally you can skip explicit encoding to stdout
. Python implicitly encodes Unicode strings to the terminal in the terminal's encoding (see sys.stdout.encoding
):
sys.stdout.write(s.ljust(20))
Another option is using print
:
print "%20s" % s # old-style
or:
print '{:20}'.format(s) # new-style
Upvotes: 5