Python, len and slices on unicode strings

Question

I am handling a situation where I need to make a string fit in the allocated gap in the screen, as I'm using unicode len() and slices[] work apparently on bytes and I end up cutting unicode strings too short, because € only occupies one space in the screen but 2 for len() or slices[].

I have the encoding headers properly setup, and I'm willing to use other things than slices or len() to deal with this, but I really need to know how many spaces will the string take and how to cut it to the available.

$cat test.py
# -*- coding: utf-8 -*-
a = "2 €uros"
b = "2 Euros"
print len(b)
print len(a)
print a[3:]
print b[3:]

$python test.py
7
9
��uros
uros

Nicholas Riley · Accepted Answer

You're not creating Unicode strings there; you're creating byte strings with UTF-8 encoding (which is variable-length, as you're seeing). You need to use constants of the form u"..." (or u'...'). If you do that, you get the expected result:

% cat test.py
# -*- coding: utf-8 -*-
a = u"2 €uros"
b = u"2 Euros"
print len(b)
print len(a)
print a[3:]
print b[3:]
% python test.py 
7
7
uros
uros

Python, len and slices on unicode strings

Answers (1)

Related Questions