Reputation: 18525
There is no difference for the printing results, what is the usage of encoding and decoding for utf-8? And is it encode('utf8') or encode('utf-8')?
u ='abc'
print(u)
u=u.encode('utf-8')
print(u)
uu = u.decode('utf-8')
print(uu)
Upvotes: 0
Views: 146
Reputation: 6252
Usually Python will first try to decode it to unicode before it can encode it back to UTF-8.There are encording which doesnt have anything to do with the character sets which can be applied to 8 bit strings
For eg
data = u'\u00c3' # Unicode data
data = data.encode('utf8')
print data
'\xc3\x83' //the output.
Please have a look through here and here.It would be helpful.
Upvotes: 0
Reputation: 26717
str.encode
encodes the string (or unicode string) into a series of bytes. In Python 3 this is a bytearray
, in Python 2 it's str
again (confusingly). When you encode a unicode string, you are left with bytes, not unicode—remember that UTF-8 is not unicode, it's an encoding method that can turn unicode codepoints into bytes.
str.decode
will decode the serialized byte stream with the selected codec, picking the proper unicode codepoints and giving you a unicode string.
So, what you're doing in Python 2 is: 'abc'
> 'abc'
> u'abc'
, and in Python 3 is:
'abc'
> b'abc'
> 'abc'
. Try printing repr(u)
or type(u)
in addition to see what's changing where.
utf_8
might be the most canonical, but it doesn't really matter.
Upvotes: 1