Reputation: 7461
I have a function which works with unicode internally, and I would like to test it using py.test
. Currently, I have the following code:
def test_num2word():
assert num2word(2320) == u"dva tisíce tři sta dvacet"
However, the assertion fails with:
E assert u'dva tis\xed...i sta dvacet ' == u'dva tis\xc3\...9i sta dvacet'
E - dva tis\xedce t\u0159i sta dvacet
E ? ^ ^ -
E + dva tis\xc3\xadce t\xc5\x99i sta dvacet
E ?
As I understand, my function correctly returns unicode, which it then tries to compare to an utf-8 encoded string, which (obviously) fails. Yet I thought using u"..."
in my source would also convert the string to the same encoding used internally by Python.
My question is, is there a sane way of comparing these, or do I need to pepper each test statement with a decode('utf-8')
(on the right-hand side) or an encode('utf-8')
(on the left side. Even if I write a wrapper function, this doesn't strike me as ideal -- there must be a way to compare this sanely! No, using Python 3 is not an option.
Upvotes: 0
Views: 79
Reputation: 536577
It's not clear from your error but it looks like;
assert u'dva tis\xed...i sta dvacet ' == u'dva tis\xc3\...9i sta dvacet'
both those strings have u
on the front so they're unicode strings. But one contains mangled content: dva tisÃce tÅi sta dvacet
.
If that string is the one it's getting from your test py file then the problem is that the source code itself is not being read using the same encoding as you used to save it. This can be solved in two ways:
save as UTF-8 in your text editor, and include the line # -*- coding: utf-8 -*-
at the top of your file (see this question.
use string literal encoding in your source code to avoid relying on a source file encoding:
assert num2word(2320) == u'dva tis\u00edce t\u0159i sta dvacet'
(*: In what encoding they're stored internally in memory is a long story, but that's not really something you usually have to worry about as a Python programmer.)
Upvotes: 1