Summer_More_More_Tea
Summer_More_More_Tea

Reputation: 13456

How to left align a UTF-8 encoded string in python?

I'm trying to left align an UTF-8 encoded string with string.ljust. This exception is raised: UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 0: ordinal not in range(128). For example,

s = u"你好"    // a Chinese string
stdout.write(s.encode("UTF-8").ljust(20))

Am I on the right track? Or I should use other approach to format?

Thanks and Best Regards.

Upvotes: 1

Views: 2671

Answers (1)

Mark Tolonen
Mark Tolonen

Reputation: 178409

Did you post the exact code and the exact error you received? Because your code works without throwing an error on both a cp437 and utf-8 terminal. In any case you should justify the Unicode string before sending it to the terminal. Note the difference because the UTF-8-encoded Chinese has length 6 when encoded instead of length 2:

>>> sys.stdout.write(s.encode('utf-8').ljust(20) + "hello")
你好              hello
>>> sys.stdout.write(s.ljust(20).encode('utf-8') + "hello")
你好                  hello

Note also that Chinese characters are wider than the other characters in typical fixed-width fonts so things may still not line up as you like if mixing languages (see this answer for a solution):

>>> sys.stdout.write("12".ljust(20) + "hello")
12                  hello

Normally you can skip explicit encoding to stdout. Python implicitly encodes Unicode strings to the terminal in the terminal's encoding (see sys.stdout.encoding):

sys.stdout.write(s.ljust(20))

Another option is using print:

print "%20s" % s   # old-style

or:

print '{:20}'.format(s)  # new-style

Upvotes: 5

Related Questions