Python: Question about encoding

Question

I'm trying to understand the encoding stuff in python and I think I nearly managed it to understand. So here is some code which I will explain and I would like you to verify my thoughts :)

text = line.decode( encoding )
print "type(text) = %s" % type(text)
iso_8859_1 = text.encode('latin1')
print "type(iso_8859_1) = %s" % type(iso_8859_1)
unicodeStr = text.encode('utf-8')
print "type(unicodeStr) = %s" % type(unicodeStr)

So the first line

text = line.decode( encoding )

does transform a given string given in the encoding "encoding" into a unicode text format of python. Therefore the output is

type(text) =

So now, I using the original text from my file in an utf-8 encoding style and for the rest of my code "text" is a utf-8 text.

Now I want to transform (for what reason ever) the utf-8 text into some other stuff e.g. latin1 which is done by "text.encode('latin1')". The output of my code in that case is

type(iso_8859_1) = 
type(unicodeStr) =

Now, the only question that remains for me: Why is the type in the two latter cases 'str' and not 'latin1' or 'unicode'. That's what's still unclear to me.

Are the latter strings "iso_8859_1" and "unicodeStr" not encoded in "latin1" or "unicode" resprectivly?

Pill · Accepted Answer

First, utf8 != unicode.
str is basically a sequence of bytes and encoding is method of interpreting those sequence, and unicode is, well - unicode.
Joel had great post on this subject http://www.joelonsoftware.com/articles/Unicode.html

Python: Question about encoding

Answers (1)

Related Questions