Python's .format() minilanguage and Unicode

Question

I'm trying to use some of the simple unicode characters in a command line program I'm writing, but drawing these things into a table becomes difficult because Python appears to be treating single-character symbols as multi-character strings.

For example, if I try to print(u"\u2714".encode("utf-8")) I see the unicode checkmark. However, if I try to add some padding to that character (as one might in tabular structure), Python seems to be interpreting this single-character string as a 3-character one. All three of these lines print the same thing:

print("|{:1}|".format(u"\u2714".encode("utf-8")))
print("|{:2}|".format(u"\u2714".encode("utf-8")))
print("|{:3}|".format(u"\u2714".encode("utf-8")))

Now I think I understand why this is happening: it's a multibyte string. My question is, how do I get Python to pad this string appropriately?

Dan D. · Accepted Answer

Don't encode('utf-8') at that point do it latter:

>>> u"\u2714".encode("utf-8")
'\xe2\x9c\x94'

The UTF-8 encoding is three bytes long. Look at how format works with Unicode strings:

>>> u"|{:1}|".format(u"\u2714")
u'|\u2714|'
>>> u"|{:2}|".format(u"\u2714")
u'|\u2714 |'
>>> u"|{:3}|".format(u"\u2714")
u'|\u2714  |'

Tested on Python 2.7.3.

Python's .format() minilanguage and Unicode

Answers (2)

Related Questions

Python&#39;s .format() minilanguage and Unicode

Answers (2)

Related Questions

Python's .format() minilanguage and Unicode