Reputation: 1948
Working with svn logs in xml format i've accidentally got an error in my script. Error message is:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-9: ordinal not in range(128)
By debugging input data i have found what was wrong. Here is an example:
a=u'\u0440\u0435\u044c\u0434\u0437\u0444\u043a\u044b\u0443\u043a \u043c\u0443\u043a\u044b\u0448\u0449\u0442 \u0430\u0448\u0447'
>>> print a
реьдзфкыук мукышщт ашч
>>> print '{}'.format(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-9: ordinal not in range(128)
Can you please explain what is wrong with format? Seems like it sees u before string bytes and try to decode it from UTF8. However in Python 3 above example works without error.
Upvotes: 0
Views: 303
Reputation: 1121276
You are mixing Unicode and byte string values. Use a unicode format:
print u'{}'.format(a)
Demo:
>>> a=u'\u0440\u0435\u044c\u0434\u0437\u0444\u043a\u044b\u0443\u043a \u043c\u0443\u043a\u044b\u0448\u0449\u0442 \u0430\u0448\u0447'
>>> print u'{}'.format(a)
реьдзфкыук мукышщт ашч
In Python 3, strings are unicode values by default; in Python 2, u"..."
indicates a unicode value and regular strings ("..."
) are byte strings.
Mixing a byte strings and unicode value results in automatic encoding or decoding with the default codec (ASCII), and that's what happens here. The str.format()
method has to encode the Unicode value to a byte string to interpolate.
Upvotes: 2