Reputation: 3
My problem is quite simple. I want to parse a string like this one :
string = 'SENT (ADVWH Pourquoi) (NP (DET ce) (NC theme)) (PONCT ?)'
I want to use regex (I am not an expert, I have used it few times before). I want to extract the first level of brackets, i.e. I want the result to be :
(ADVWH Pourquoi)
(NP (DET ce) (NC theme))
(PONCT ?)
I used this regex, that I tested successfully on regex101, but it doesn't even want to compile :
re.compile(r"\(([^()]|(?R))*\)")
I also tried these ones that still work on regex101:
re.compile(r"\(([^\(\)]|(?R))*\)")
re.compile(r"\((([^\(\)]|(?R))*)\)")
I always get the same answer from python : unexpected end of pattern.
I really don't see what is the problem here, and why does it work on regex101 and not with python.
Thanks a lot in advance!
Upvotes: 0
Views: 78
Reputation: 5080
re
does not support recursion (the (?R)
group) - you need to use the PyPi package regex
Upvotes: 1