hkoosha
hkoosha

Reputation: 1146

unicode in python turns into '\xXX\xXX'

I'm using MySQLdb, I use:

cursor.execute(QUERY)

Where QYERY is:

INSERT (A,B,C) VALUES("%s","%s",something);'%(k1, u'W')**

w Is a persian letter (literally it is! I just replaced it with W in case you have problem viewing persian letters, it is something like ب), and k1 is english. but i get:

`Traceback (most recent call last):
  File "./i.py", line 47, in <module>
    c.doit(db)
  File "./i.py", line 41, in doit
    db.cr.execute('INSERT (A,B,C) VALUES("%s","%s",something);'%(k1, u'ب'))
  File "/usr/lib/pymodules/python2.7/MySQLdb/cursors.py", line 174, in execute
    self.errorhandler(self, exc, value)
  File "/usr/lib/pymodules/python2.7/MySQLdb/connections.py", line 36, in defaulterrorhandler
    raise errorclass, errorvalue
_mysql_exceptions.ProgrammingError: (1064, 'You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near \'(term_tit,term_desc,term_cat) VALUES("a\n","\xd8\xa8",general)\' at line 1')`

See? the W letter turns into "\xd8\xa8" What should i do?

Update: The mistake in sql query is fixed, BUT the \xd8\xa8 problem persists, now when retrieving data from Database, i get "??????" instead of what i entered

Upvotes: 2

Views: 765

Answers (2)

Raymond Hettinger
Raymond Hettinger

Reputation: 226296

The docs recommend avoiding problems by using placeholder marks and letting Python do the string substitution for you:

 db.cr.execute('INSERT (A,B,C) VALUES(?, ?,something)', (k1, u'W'))

The placeholder marks are mainly about preventing injection attacks but they also take care of properly escaping string and dealing with encoding issues.

Let me know if this helps with the unicode substitution issue.

Upvotes: 0

ruakh
ruakh

Reputation: 183301

You're missing an INTO table_name clause. That's the only actual problem I see.

The changing of ب to \xd8\xa8 just means that your Unicode is getting encoded as UTF-8 on the way to the database (which is probably what you want). MySQL is reporting the UTF-8-encoded bytes by using a \xXX notation (hexadecimal escape sequences; for example, \x20 means an ASCII space) to give an all-ASCII error message (which can be a bit cryptic, especially if you're not used to it, but it's not a problem).

Upvotes: 3

Related Questions