EdW
EdW

Reputation: 33

Python RegEx missing parenthesis error

I was visiting some old python code, which had not thrown up any errors before, but when I tried to run it I encountered an error. This is the code that was giving me an error:

import re

text = r"I quote \"How're you?\" to you."
double = [z.start() for z in re.finditer('(?<!\\)(?:\\\\)*(")', text)]
single = [z.start() for z in re.finditer("(?<!\\)(?:\\\\)*(')", text)]
print(double)
print(single)

The output I had hoped to get from this program was:

[]
[13]

This, however, gives me the error:

double = [z.start() for z in re.finditer('(?<!(?:\\))(?:\\\\)*(")', text)]
File "C:\Users\Me\AppData\Local\Programs\Python\Python35-32\lib\re.py", line 220, in finditer
return _compile(pattern, flags).finditer(string)
File "C:\Users\Me\AppData\Local\Programs\Python\Python35-32\lib\re.py", line 293, in _compile
p = sre_compile.compile(pattern, flags)
File "C:\Users\Me\AppData\Local\Programs\Python\Python35-32\lib\sre_compile.py", line 536, in compile
p = sre_parse.parse(p, flags)
File "C:\Users\Me\AppData\Local\Programs\Python\Python35-32\lib\sre_parse.py", line 829, in parse
p = _parse_sub(source, pattern, 0)
File "C:\Users\Me\AppData\Local\Programs\Python\Python35-32\lib\sre_parse.py", line 437, in _parse_sub
itemsappend(_parse(source, state))
File "C:\Users\Me\AppData\Local\Programs\Python\Python35-32\lib\sre_parse.py", line 722, in _parse
source.tell() - start)
sre_constants.error: missing ), unterminated subpattern at position 0

It is worth mentioning that I had updated python before running this, so maybe the update to python caused this error? (I am now running python 3.5.2, but I can't remember what it was before)

Also, in case it helps, I was trying to find all cases of single or double quotes that were not escaped by a backslash i.e.

' and " are picked up

\' and \" are not

\' and \" are picked up and so on...

I was going to use this to then separate nested strings in the string from other parts of the string.

It is the negative lookbehind (?<!\\) that is causing the issue, but I cannot see what is wrong. The backslash is escaped by the one in front, so I cannot see where the missing bracket is.

Strangely, this works on regex101, so I am starting to run out of ways to debug this.

I tried different replacements for the negative lookbehind to try to get this to work:

(?<!\) #Gets the error, but that is expected

(?<!\\\\) #Same error again, same problem as the original case

(?<!\\\) #Returns [8, 20] and [13]

Clearly this last one has incorrect syntax. Python, however, is interpreting this as correct, but I have no idea what it is actually interpreting this as.

Anyway, I am aware that there is probably some simple explanation, maybe some RegEx syntax I am not aware of.

Also, if there is an alternative, less messy solution to what I am attempting, please feel free to give me that solution instead.

Thank you very much, I am nearly tearing my hair out,

EdW

Upvotes: 2

Views: 6422

Answers (1)

Navidad20
Navidad20

Reputation: 832

Simply add r to the front of the regex string

import re
text = r"I quote \"How're you?\" to you."
double = [z.start() for z in re.finditer(r'(?<!\\)(?:\\\\)*(")', text)]
single = [z.start() for z in re.finditer(r"(?<!\\)(?:\\\\)*(')", text)]
print(double)
print(single)

Output:

[]
[13]

Upvotes: 3

Related Questions