william007
william007

Reputation: 18525

encode and decode for a specific character set

There is no difference for the printing results, what is the usage of encoding and decoding for utf-8? And is it encode('utf8') or encode('utf-8')?

u ='abc'
print(u)
u=u.encode('utf-8')
print(u)
uu = u.decode('utf-8')
print(uu)

Upvotes: 0

Views: 146

Answers (2)

Avinash Babu
Avinash Babu

Reputation: 6252

Usually Python will first try to decode it to unicode before it can encode it back to UTF-8.There are encording which doesnt have anything to do with the character sets which can be applied to 8 bit strings

For eg

data = u'\u00c3'            # Unicode data
 data = data.encode('utf8')
 print data

'\xc3\x83' //the output.

Please have a look through here and here.It would be helpful.

Upvotes: 0

Nick T
Nick T

Reputation: 26717

str.encode encodes the string (or unicode string) into a series of bytes. In Python 3 this is a bytearray, in Python 2 it's str again (confusingly). When you encode a unicode string, you are left with bytes, not unicode—remember that UTF-8 is not unicode, it's an encoding method that can turn unicode codepoints into bytes.

str.decode will decode the serialized byte stream with the selected codec, picking the proper unicode codepoints and giving you a unicode string.

So, what you're doing in Python 2 is: 'abc' > 'abc' > u'abc', and in Python 3 is: 'abc' > b'abc' > 'abc'. Try printing repr(u) or type(u) in addition to see what's changing where.

utf_8 might be the most canonical, but it doesn't really matter.

Upvotes: 1

Related Questions