Python: instantly decoding after encode

Question

Found in legacy:

somevar.encode('utf-8').decode('utf-8')

Can we find this construction useful when not trying to catch encoding errors?

skrrgwasme · Accepted Answer

Experimentation in Python 2.7.6 interpreter:

a = u"string"
a

Output: u'string'

b = a.encode('utf-8').decode('utf-8')
b

Output: u'string'

b = a.decode('utf-8').encode('utf-8')
b

Output: 'string'

a = "string"
a

Output: 'string'

b = a.encode('utf-8').decode('utf-8')
b

Output: u'string'

b = a.decode('utf-8').encode('utf-8')
b

Output: 'string'

Note that whether the original string is Unicode or not, the output of encode -> decode will be a Unicode string. The output of decode -> encode will not be a unicode string. A trivial note though, is that since strings are immutable, the code line as you posted it is useless for anything besides checking for UnicodeErrors because it doesn't catch the return value of the function calls.

The only real effect of the encode -> decode construct is that all strings passed through it (and caught from the return) will be Unicode strings. Why you would want to do this instead of unicode_string = unicode(normal_string, encoding='UTF-8') I have no idea.

Python: instantly decoding after encode

Answers (1)

Related Questions