Reputation: 33
I was visiting some old python code, which had not thrown up any errors before, but when I tried to run it I encountered an error. This is the code that was giving me an error:
import re
text = r"I quote \"How're you?\" to you."
double = [z.start() for z in re.finditer('(?<!\\)(?:\\\\)*(")', text)]
single = [z.start() for z in re.finditer("(?<!\\)(?:\\\\)*(')", text)]
print(double)
print(single)
The output I had hoped to get from this program was:
[]
[13]
This, however, gives me the error:
double = [z.start() for z in re.finditer('(?<!(?:\\))(?:\\\\)*(")', text)]
File "C:\Users\Me\AppData\Local\Programs\Python\Python35-32\lib\re.py", line 220, in finditer
return _compile(pattern, flags).finditer(string)
File "C:\Users\Me\AppData\Local\Programs\Python\Python35-32\lib\re.py", line 293, in _compile
p = sre_compile.compile(pattern, flags)
File "C:\Users\Me\AppData\Local\Programs\Python\Python35-32\lib\sre_compile.py", line 536, in compile
p = sre_parse.parse(p, flags)
File "C:\Users\Me\AppData\Local\Programs\Python\Python35-32\lib\sre_parse.py", line 829, in parse
p = _parse_sub(source, pattern, 0)
File "C:\Users\Me\AppData\Local\Programs\Python\Python35-32\lib\sre_parse.py", line 437, in _parse_sub
itemsappend(_parse(source, state))
File "C:\Users\Me\AppData\Local\Programs\Python\Python35-32\lib\sre_parse.py", line 722, in _parse
source.tell() - start)
sre_constants.error: missing ), unterminated subpattern at position 0
It is worth mentioning that I had updated python before running this, so maybe the update to python caused this error? (I am now running python 3.5.2, but I can't remember what it was before)
Also, in case it helps, I was trying to find all cases of single or double quotes that were not escaped by a backslash i.e.
' and " are picked up
\' and \" are not
\' and \" are picked up and so on...
I was going to use this to then separate nested strings in the string from other parts of the string.
It is the negative lookbehind (?<!\\) that is causing the issue, but I cannot see what is wrong. The backslash is escaped by the one in front, so I cannot see where the missing bracket is.
Strangely, this works on regex101, so I am starting to run out of ways to debug this.
I tried different replacements for the negative lookbehind to try to get this to work:
(?<!\) #Gets the error, but that is expected
(?<!\\\\) #Same error again, same problem as the original case
(?<!\\\) #Returns [8, 20] and [13]
Clearly this last one has incorrect syntax. Python, however, is interpreting this as correct, but I have no idea what it is actually interpreting this as.
Anyway, I am aware that there is probably some simple explanation, maybe some RegEx syntax I am not aware of.
Also, if there is an alternative, less messy solution to what I am attempting, please feel free to give me that solution instead.
Thank you very much, I am nearly tearing my hair out,
EdW
Upvotes: 2
Views: 6422
Reputation: 832
Simply add r
to the front of the regex string
import re
text = r"I quote \"How're you?\" to you."
double = [z.start() for z in re.finditer(r'(?<!\\)(?:\\\\)*(")', text)]
single = [z.start() for z in re.finditer(r"(?<!\\)(?:\\\\)*(')", text)]
print(double)
print(single)
Output:
[]
[13]
Upvotes: 3