Reputation: 23198
I am handling a situation where I need to make a string fit in the allocated gap in the screen, as I'm using unicode len() and slices[] work apparently on bytes and I end up cutting unicode strings too short, because €
only occupies one space in the screen but 2 for len() or slices[].
I have the encoding headers properly setup, and I'm willing to use other things than slices or len() to deal with this, but I really need to know how many spaces will the string take and how to cut it to the available.
$cat test.py
# -*- coding: utf-8 -*-
a = "2 €uros"
b = "2 Euros"
print len(b)
print len(a)
print a[3:]
print b[3:]
$python test.py
7
9
��uros
uros
Upvotes: 9
Views: 7858
Reputation: 44361
You're not creating Unicode strings there; you're creating byte strings with UTF-8 encoding (which is variable-length, as you're seeing). You need to use constants of the form u"..."
(or u'...'
). If you do that, you get the expected result:
% cat test.py
# -*- coding: utf-8 -*-
a = u"2 €uros"
b = u"2 Euros"
print len(b)
print len(a)
print a[3:]
print b[3:]
% python test.py
7
7
uros
uros
Upvotes: 17