Reputation: 6418
I'm trying to use some of the simple unicode characters in a command line program I'm writing, but drawing these things into a table becomes difficult because Python appears to be treating single-character symbols as multi-character strings.
For example, if I try to print(u"\u2714".encode("utf-8"))
I see the unicode checkmark. However, if I try to add some padding to that character (as one might in tabular structure), Python seems to be interpreting this single-character string as a 3-character one. All three of these lines print the same thing:
print("|{:1}|".format(u"\u2714".encode("utf-8")))
print("|{:2}|".format(u"\u2714".encode("utf-8")))
print("|{:3}|".format(u"\u2714".encode("utf-8")))
Now I think I understand why this is happening: it's a multibyte string. My question is, how do I get Python to pad this string appropriately?
Upvotes: 2
Views: 332
Reputation: 6017
Make your format strings unicode:
from __future__ import print_function
print(u"|{:1}|".format(u"\u2714"))
print(u"|{:2}|".format(u"\u2714"))
print(u"|{:3}|".format(u"\u2714"))
outputs:
|✔|
|✔ |
|✔ |
Upvotes: 2
Reputation: 74655
Don't encode('utf-8')
at that point do it latter:
>>> u"\u2714".encode("utf-8")
'\xe2\x9c\x94'
The UTF-8 encoding is three bytes long. Look at how format works with Unicode strings:
>>> u"|{:1}|".format(u"\u2714")
u'|\u2714|'
>>> u"|{:2}|".format(u"\u2714")
u'|\u2714 |'
>>> u"|{:3}|".format(u"\u2714")
u'|\u2714 |'
Tested on Python 2.7.3.
Upvotes: 1