TCAllen07
TCAllen07

Reputation: 1404

Python MySQLdb escape char: query works in MySQL but not in python MySQLdb

I'm trying to hand off data from Pandas (imported from a CSV) to a MySQL database via Python's MySQLdb library. I'm having trouble when literal backslashes come into play. I escape the single backslashes from the raw input, so Python knows they're a literal backslash, not an escape on the subsequent character. But when I try to execute an INSERT query, MySQLdb says there's a syntax error. But here's the confusing & frustrating part: If I copy/paste the exact string into MySQL, it executes without problem.

I've tried to keep the data & structure as close to possible as the actual data, but changed it to retain privacy. Note that there are two similarly-offending values, at the end of the SourceSystemID column in the first row, and the MiddleInitial column in the second row.

In [39]: test
Out[39]: 

  ehrSystemID SourceSystemID LastName FirstName MiddleInitial  Sex  
0   fakePlace           ABC\      NaN       NaN           NaN  NaN   
1   fakePlace            XYZ    Smith      John             \    M   
          npi  deaNumber LicenseNumber ProvSpecialty  dob  
0  1234567890  AB1234567       !123456      Internal  NaN  
1         NaN        NaN       B123456      Internal  NaN  

The values of these rows converted to a string to append to the end of an INSERT statement (note that all MySQL columns will be varchar, so all values are wrapped in single-quotes)

In [40]: testVals
Out[40]: "('fakePlace', 'ABC\\', '', '', '', '', '1234567890', 'AB1234567', '!123456', 'Internal', ''), ('fakePlace', 'XYZ', 'Smith', 'John', '\\', 'M', '', '', 'B123456', 'Internal', '')"

The command I pass to MySQLdb and resulting error:

In [41]: testCmd1
Out[41]: "INSERT INTO source_providers (ehrSourceID, sourceSystemID, nameLast, nameFirst, nameMiddle, sex, npiRaw, dea, licenseNumber, specialty1, dobRaw) VALUES ('fakePlace', 'ABC\\', '', '', '', '', '1234567890', 'AB1234567', '!123456', 'Internal', ''), ('fakePlace', 'XYZ', 'Smith', 'John', '\\', 'M', '', '', 'B123456', 'Internal', '')"

In [42]: db.Cur.execute(testCmd1)
---------------------------------------------------------------------------
ProgrammingError                          Traceback (most recent call last)
<ipython-input-42-32fe62e740d8> in <module>()
----> 1 db.Cur.execute(testCmd1)

/Library/Python/2.7/site-packages/MySQL_python-1.2.4b4-py2.7-macosx-10.8-intel.egg/MySQLdb/cursors.pyc in execute(self, query, args)
    200             del tb
    201             self.messages.append((exc, value))
--> 202             self.errorhandler(self, exc, value)
    203         self._executed = query
    204         if not self._defer_warnings: self._warning_check()

/Library/Python/2.7/site-packages/MySQL_python-1.2.4b4-py2.7-macosx-10.8-intel.egg/MySQLdb/connections.pyc in defaulterrorhandler(***failed resolving arguments***)
     34     del cursor
     35     del connection
---> 36     raise errorclass, errorvalue
     37 
     38 re_numeric_part = re.compile(r"^(\d+)")

ProgrammingError: (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '1234567890', 'AB1234567', '!123456', 'Internal', ''), ('fakePlace', 'XYZ', 'Smit' at line 1")

Successfully executing the exact command in MySQL directly:

mysql> INSERT INTO source_providers (ehrSourceID, sourceSystemID, nameLast, nameFirst, nameMiddle, sex, npiRaw, dea, licenseNumber, specialty1, dobRaw) VALUES ('fakePlace', 'ABC\\', '', '', '', '', '1234567890', 'AB1234567', '!123456', 'Internal', ''), ('fakePlace', 'XYZ', 'Smith', 'John', '\\', 'M', '', '', 'B123456', 'Internal', '');
Query OK, 2 rows affected (0.01 sec)
Records: 2  Duplicates: 0  Warnings: 0

Is there something that happens when the string is passed to the MySQL API? How can I avoid filtering out the offending rows and manually inserting them separately? This involves many tens of thousands of rows, thus my focus on automating the process.

Thanks.

Upvotes: 2

Views: 2643

Answers (1)

TCAllen07
TCAllen07

Reputation: 1404

It turns out the confusion was in how Python printed the query string, combined with a bug in my own code. First, the bug: I incorrectly used str.replace() to replace the single literal backslash with an escaped double:

sqlCmd.replace('\\', '\\\\') 

So when Python showed the double-slash when printing the string, I assumed it had successfully replaced the single-slash. The correct code (in case anyone who reads this makes the same mistake) is:

sqlCmd = sqlCmd.replace('\\', '\\\\')

The second source of confusion was that the Python interpreter uses __repr__() which automatically inserts an escape backslash for displaying but this escape backslash isn't actually in the original string. Python assumes you're smart enough to know this. Turns out I wasn't. ;-)

A short additional explanation of __repr__() and backslashes is available in an answer to another stackoverflow question here.

Upvotes: 5

Related Questions