Daniel Quinn
Daniel Quinn

Reputation: 6418

Python's .format() minilanguage and Unicode

I'm trying to use some of the simple unicode characters in a command line program I'm writing, but drawing these things into a table becomes difficult because Python appears to be treating single-character symbols as multi-character strings.

For example, if I try to print(u"\u2714".encode("utf-8")) I see the unicode checkmark. However, if I try to add some padding to that character (as one might in tabular structure), Python seems to be interpreting this single-character string as a 3-character one. All three of these lines print the same thing:

print("|{:1}|".format(u"\u2714".encode("utf-8")))
print("|{:2}|".format(u"\u2714".encode("utf-8")))
print("|{:3}|".format(u"\u2714".encode("utf-8")))

Now I think I understand why this is happening: it's a multibyte string. My question is, how do I get Python to pad this string appropriately?

Upvotes: 2

Views: 332

Answers (2)

chucksmash
chucksmash

Reputation: 6017

Make your format strings unicode:

from __future__ import print_function

print(u"|{:1}|".format(u"\u2714"))
print(u"|{:2}|".format(u"\u2714"))
print(u"|{:3}|".format(u"\u2714"))

outputs:

|✔|
|✔ |
|✔  |

Upvotes: 2

Dan D.
Dan D.

Reputation: 74655

Don't encode('utf-8') at that point do it latter:

>>> u"\u2714".encode("utf-8")
'\xe2\x9c\x94'

The UTF-8 encoding is three bytes long. Look at how format works with Unicode strings:

>>> u"|{:1}|".format(u"\u2714")
u'|\u2714|'
>>> u"|{:2}|".format(u"\u2714")
u'|\u2714 |'
>>> u"|{:3}|".format(u"\u2714")
u'|\u2714  |'

Tested on Python 2.7.3.

Upvotes: 1

Related Questions