Reputation: 635
I am looking for a regex that will provide me with capture groups for each set of 2 single quotes (''
) within the single-quoted strings ('string'
) that are part of a comma-separated list. For instance the string 'tom''s'
would have a single group between the m
and the s
. I've come close but keep getting tripped up by either erroneously matching up with the enclosing single quotes or with only capturing some of the 2 single quotes within a string.
Example Input
'11','22'',','''33','44''','''55''','6''''6'
Desired Groups (7, shown in parens)
'11','22(''),','('')33','44('')','('')55('')','6('')('')6'
For context, what I'm ultimately attempting to do is replace these 2 single quotes within the comma-separated sequence of strings with another value in order to make subsequent parsing easier.
Note also that commas may be contained within the single quoted strings.
Upvotes: 4
Views: 329
Reputation: 627607
You cannot match the double single quotes like that with Python re
module. You can just match the single-quoted entries and capture the inner part of each entry, and using a lambda, replace the ''
inside with a mere .replace
:
import re
p = re.compile(r"'([^']*(?:''[^']*)*)'")
test_str = "'11','22'',','''33','44''','''55''','6''''6'"
print(p.sub(lambda m: "'{}'".format(m.group(1).replace("''", "&")), test_str))
See IDEONE demo, output: '11','22&,','&33','44&','&55&','6&&6'
The regex is '([^']*(?:''[^']*)*)'
:
'
- opening '
(
- Capture group #1 start[^']*
- zero or more non-'
(?:''[^']*)*
- 0+ sequences of ''
followed with 0+ non-'
)
- Capture group #1 end'
- closing '
Upvotes: 3