Reputation: 31531
I am trying to retrieve the value of the fields in this SQL statement, however I am running into trouble with escaped quote characters:
sql = "INSERT INTO `shops` VALUES (35723,'Counsel\'s kitchen');"
I am playing with variants of the following, none of which are satisfying:
re.select("\(\d*, '([^']*)',", sql);
That is:
\(\d*, ' Opening parentheses followed by any amount of numerals followed by a comma, followed by a space, followed by a single quote.
([^']*) Retrieve all characters other than a single quote.
', Single quote, comma
My best attempt so far:
re.select("\(\d*, '(\.*)','", sql);
That is:
\(\d*, ' Opening parentheses followed by any amount of numerals followed by a comma, followed by a space, followed by a single quote.
(\.*) Retrieve all characters.
',' Single quote, comma, single quote.
However, I would really like a way to express "Every character, including the two-character string \'
, but not including the single character '
". I had considered simply replacing \'
with some obscure string, performing '(\.*)'
, and then replacing the obscure string back with '
(No escape character, as it is no longer needed). However, being Python, surely there is a cleverer way!
Note that the string is actually repeated many time in the real output, and that I do need all the values (ideally in a list):
sql = """
INSERT INTO `shops` VALUES (35723,'Counsel\'s kitchen','Some address'),(32682,'Anderson and his bar','12 Main street'),(32491,'Sid\'s guitar\'s string','Old London'),(39119,'Roger\'s wall',''),(45914,'David drinks a beer','New London');
"""
Upvotes: 1
Views: 474
Reputation: 1042
Buildung upon @HamZa 's suggestion when you can guarantee the single quotes it gets easier to group in bigger contexts:
'(?:\\'|[^'])*'
Otherwise, if you add other groups you have to update the backreferences
This also should be slightly faster as it has no lookahead - if you care. (According to regex page: 114 Steps opposed to 200 Steps)
If you need both, for performance reasons this would also work (escape "
as needed)
'(?:\\'|[^'])*'|"(?:\\"|[^"])*"
All of these solutions have a small flaw on corrupt input like
'Counsel\'s kitchen', 'tes\\t\'
the last group will still be matched!
Upvotes: 2
Reputation: 4282
Could you say what version of Python you're using? On my 2.7, it seems to do the right thing with the escaped quote inside """ already, so then you can extract the data as a list of lists like this:
[re.split("'?,'",t.strip("'")) for t in re.findall("\((.*?)\)",sql)]
Upvotes: 1