Andaris
Andaris

Reputation: 635

Regular Expression to Match All Enclosed '' (2 Single Quotes)

I am looking for a regex that will provide me with capture groups for each set of 2 single quotes ('') within the single-quoted strings ('string') that are part of a comma-separated list. For instance the string 'tom''s' would have a single group between the m and the s. I've come close but keep getting tripped up by either erroneously matching up with the enclosing single quotes or with only capturing some of the 2 single quotes within a string.

Example Input

'11','22'',','''33','44''','''55''','6''''6'

Desired Groups (7, shown in parens)

 '11','22(''),','('')33','44('')','('')55('')','6('')('')6'

For context, what I'm ultimately attempting to do is replace these 2 single quotes within the comma-separated sequence of strings with another value in order to make subsequent parsing easier.

Note also that commas may be contained within the single quoted strings.

Upvotes: 4

Views: 329

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627607

You cannot match the double single quotes like that with Python re module. You can just match the single-quoted entries and capture the inner part of each entry, and using a lambda, replace the '' inside with a mere .replace:

import re
p = re.compile(r"'([^']*(?:''[^']*)*)'")
test_str = "'11','22'',','''33','44''','''55''','6''''6'"
print(p.sub(lambda m: "'{}'".format(m.group(1).replace("''", "&")), test_str))

See IDEONE demo, output: '11','22&,','&33','44&','&55&','6&&6'

The regex is '([^']*(?:''[^']*)*)':

  • ' - opening '
  • ( - Capture group #1 start
  • [^']* - zero or more non-'
  • (?:''[^']*)* - 0+ sequences of '' followed with 0+ non-'
  • ) - Capture group #1 end
  • ' - closing '

Upvotes: 3

Related Questions