Reputation: 6941
I would like to convert the following string
"For "The" Win","Way "To" Go"
to
"For ""The"" Win","Way ""To"" Go"
The straightforward regex would be
str2 = re.sub(r'(?<!,|^)"(?=\w)|(?<=\w)"(?!,|$)', '""', str1,flags=re.MULTILINE)
i.e., Double the quotes that are
The problem is I am using python and it's regex engine does not allow using the OR operator in the lookbehind construct. I get the error
sre_constants.error: look-behind requires fixed-width pattern
What I am looking for is a regex that will replace the '"'
around 'The' and 'To' with '""'
.
I can use the following regex (An answer provided to another question)
\b\s*"(?!,|[ \t]*$)
but that consumes the space just before the 'The' and 'To' and I get the below
"For""The"" Win","Way""To"" Go"
Is there a workaround so that I can double the quotes around 'The' and 'To' without consuming the spaces just before them?
Upvotes: 2
Views: 3178
Reputation: 75222
Looks to me like you don't need to bother with anchors.
So you don't need to use anchors, just do a positive lookbehind/lookahead for a single character:
result = re.sub(r'(?<=[^",\r\n])"(?=[^,"\r\n])', '""', subject)
I threw in the "
on the chance that there might be some quotes that are already escaped. But realistically, if that's the case you're probably screwed anyway. ;)
Upvotes: 2
Reputation: 27575
str2 = re.sub('(?<=[^,])"(?=\w)'
'|'
'(?<=\w)"(?!,|$)',
'""', ss,
flags=re.MULTILINE)
I always wonder why people use raw strings for regex patterns when it isn't needed.
Note I changed your str
which is the name of a builtin class to ss
.
For `"fun" :
str2 = re.sub('"'
'('
'(?<=[^,]")(?=\w)'
'|'
'(?<=\w")(?!,|$)'
')',
'""', ss,
flags=re.MULTILINE)
or also
str2 = re.sub('(?<=[^,]")(?=\w)'
'|'
'(?<=\w")(?!,|$)',
'"', ss,
flags=re.MULTILINE)
Upvotes: 0
Reputation: 25954
Most direct workaround whenever you encounter this issue: explode the look-behind into two look-behinds.
str2 = re.sub(r'(?<!,)(?<!^)"(?=\w)|(?<=\w)"(?!,|$)', '""', str1,flags=re.MULTILINE)
(don't name your strings str
)
Upvotes: 1
Reputation: 97918
Instead of saying not preceded by comma or the line start, say preceded by a non-comma character:
r'(?<=[^,])"(?=\w)|(?<=\w)"(?!,|$)'
Upvotes: 2