WhiskerBiscuit
WhiskerBiscuit

Reputation: 5157

How can I convert this to unicode so it displays properly?

I'm querying a database which from the MySQL workbench returns the following value:

Vitória da Conquista

which should be displayed as:

Vitória da Conquista

No matter what I've tried I can't get convert 'Vit\xc3\xb3ria da Conquista' into 'Vitória da Conquista'

#Querying MySQL "world" database
print "====================================="
query = 'select name from city where id=283;'
cursor.execute(query)
cities = cursor.fetchall()
print cities
for city in cities:     
    cs = str(city)
    cs = cs[3:-3].decode('utf-8')
    print cs
    print cs.decode('utf-8')
    print cs.encode('ascii','ignore')

the output of which looks like:

=====================================
[(u'Vit\xc3\xb3ria da Conquista',)]
Vit\xc3\xb3ria da Conquista
Vit\xc3\xb3ria da Conquista
Vit\xc3\xb3ria da Conquista

Upvotes: 0

Views: 131

Answers (3)

WhiskerBiscuit
WhiskerBiscuit

Reputation: 5157

Well, this actually worked. I'm not sure why however. But I am getting the correct value of Vitória da Conquista. I would like to understand what is happening however.

#Querying MySQL "world" database
query = 'SELECT CONVERT(CAST(Name as BINARY) USING utf8) from city where id = 283;'
cursor.execute(query)
cities = cursor.fetchall()
for tup in cities:     
    cs=tup[0]
    print cs

Upvotes: 1

MikeHunter
MikeHunter

Reputation: 4304

You are getting unicode strings back, stored in a list of tuples, which is what fetchall does. So you don't need to encode or decode at all. Just try this:

#Querying MySQL "world" database
print "====================================="
query = 'select name from city where id=283;'
cursor.execute(query)
cities = cursor.fetchall()
for tup in cities:     
    cs = tup[0]
    print cs

If this doesn't print right, then you probably have issues with your terminal, as mentioned by @Jarrod Roberson. The only other possibility is that the data was entered into, or is being returned from, the database with the wrong (unexpected) encoding.

Upvotes: 0

bortzmeyer
bortzmeyer

Reputation: 35459

If the data coming in is in UTF-8 (which looks like it is), use (in Python 2), unicode() to convert it from bytes to a Python Unicode string:

cs = unicode(cs[3:-3], "utf-8")

Basic rule: inside your code, always use Unicode strings. Convert with unicode() input data and with encode() output data.

Upvotes: 0

Related Questions