Reputation: 1404
I'm trying to hand off data from Pandas (imported from a CSV) to a MySQL database via Python's MySQLdb library. I'm having trouble when literal backslashes come into play. I escape the single backslashes from the raw input, so Python knows they're a literal backslash, not an escape on the subsequent character. But when I try to execute an INSERT query, MySQLdb says there's a syntax error. But here's the confusing & frustrating part: If I copy/paste the exact string into MySQL, it executes without problem.
I've tried to keep the data & structure as close to possible as the actual data, but changed it to retain privacy. Note that there are two similarly-offending values, at the end of the SourceSystemID column in the first row, and the MiddleInitial column in the second row.
In [39]: test
Out[39]:
ehrSystemID SourceSystemID LastName FirstName MiddleInitial Sex
0 fakePlace ABC\ NaN NaN NaN NaN
1 fakePlace XYZ Smith John \ M
npi deaNumber LicenseNumber ProvSpecialty dob
0 1234567890 AB1234567 !123456 Internal NaN
1 NaN NaN B123456 Internal NaN
The values of these rows converted to a string to append to the end of an INSERT statement (note that all MySQL columns will be varchar, so all values are wrapped in single-quotes)
In [40]: testVals
Out[40]: "('fakePlace', 'ABC\\', '', '', '', '', '1234567890', 'AB1234567', '!123456', 'Internal', ''), ('fakePlace', 'XYZ', 'Smith', 'John', '\\', 'M', '', '', 'B123456', 'Internal', '')"
The command I pass to MySQLdb and resulting error:
In [41]: testCmd1
Out[41]: "INSERT INTO source_providers (ehrSourceID, sourceSystemID, nameLast, nameFirst, nameMiddle, sex, npiRaw, dea, licenseNumber, specialty1, dobRaw) VALUES ('fakePlace', 'ABC\\', '', '', '', '', '1234567890', 'AB1234567', '!123456', 'Internal', ''), ('fakePlace', 'XYZ', 'Smith', 'John', '\\', 'M', '', '', 'B123456', 'Internal', '')"
In [42]: db.Cur.execute(testCmd1)
---------------------------------------------------------------------------
ProgrammingError Traceback (most recent call last)
<ipython-input-42-32fe62e740d8> in <module>()
----> 1 db.Cur.execute(testCmd1)
/Library/Python/2.7/site-packages/MySQL_python-1.2.4b4-py2.7-macosx-10.8-intel.egg/MySQLdb/cursors.pyc in execute(self, query, args)
200 del tb
201 self.messages.append((exc, value))
--> 202 self.errorhandler(self, exc, value)
203 self._executed = query
204 if not self._defer_warnings: self._warning_check()
/Library/Python/2.7/site-packages/MySQL_python-1.2.4b4-py2.7-macosx-10.8-intel.egg/MySQLdb/connections.pyc in defaulterrorhandler(***failed resolving arguments***)
34 del cursor
35 del connection
---> 36 raise errorclass, errorvalue
37
38 re_numeric_part = re.compile(r"^(\d+)")
ProgrammingError: (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '1234567890', 'AB1234567', '!123456', 'Internal', ''), ('fakePlace', 'XYZ', 'Smit' at line 1")
Successfully executing the exact command in MySQL directly:
mysql> INSERT INTO source_providers (ehrSourceID, sourceSystemID, nameLast, nameFirst, nameMiddle, sex, npiRaw, dea, licenseNumber, specialty1, dobRaw) VALUES ('fakePlace', 'ABC\\', '', '', '', '', '1234567890', 'AB1234567', '!123456', 'Internal', ''), ('fakePlace', 'XYZ', 'Smith', 'John', '\\', 'M', '', '', 'B123456', 'Internal', '');
Query OK, 2 rows affected (0.01 sec)
Records: 2 Duplicates: 0 Warnings: 0
Is there something that happens when the string is passed to the MySQL API? How can I avoid filtering out the offending rows and manually inserting them separately? This involves many tens of thousands of rows, thus my focus on automating the process.
Thanks.
Upvotes: 2
Views: 2643
Reputation: 1404
It turns out the confusion was in how Python printed the query string, combined with a bug in my own code.
First, the bug: I incorrectly used str.replace()
to replace the single literal backslash with an escaped double:
sqlCmd.replace('\\', '\\\\')
So when Python showed the double-slash when printing the string, I assumed it had successfully replaced the single-slash. The correct code (in case anyone who reads this makes the same mistake) is:
sqlCmd = sqlCmd.replace('\\', '\\\\')
The second source of confusion was that the Python interpreter uses __repr__()
which automatically inserts an escape backslash for displaying but this escape backslash isn't actually in the original string. Python assumes you're smart enough to know this. Turns out I wasn't. ;-)
A short additional explanation of __repr__()
and backslashes is available in an answer to another stackoverflow question here.
Upvotes: 5