VaidAbhishek
VaidAbhishek

Reputation: 6144

Python re.compile. Unbalanced Parenthesis error

I am trying to compile a regex to be able to accumulate a sequence of hashtags (r'#\w+') from a tweet. I want to be able to compile two regexes which can do this from starting and end ing of the tweet. I am using python 272 and my code is like this.

HASHTAG_SEQ_REGEX_PATTERN           = r"""
(                                       #Outermost grouping to match overall regex
#\w+                                    #The hashtag matching. It's a valid combination of \w+
([:\s,]*#\w+)*                          #This is an optional (0 or more) sequence of hashtags separated by [\s,:]*
)                                       #Closing parenthesis of outermost grouping to match overall regex
"""

LEFT_HASHTAG_REGEX_SEQ      = re.compile('^' + HASHTAG_SEQ_REGEX_PATTERN , re.VERBOSE | re.IGNORECASE)

When the line where I'm compiling the regex is executed, I get following error:

sre_constants.error: unbalanced parenthesis

I don't know why am I getting this, as there is no unbalanced parenthesis that I can see in my Regex Pattern.

Upvotes: 1

Views: 7548

Answers (4)

eyquem
eyquem

Reputation: 27585

You wouldn't have had this problem if you'd written the pattern as folows:

HASHTAG_SEQ_REGEX_PATTERN = (
'('    #Outermost grouping to match overall regex
'#\w+'     #The hashtag matching. It's a valid combination of \w+
'([:\s,]*#\w+)*'    #This is an optional (0 or more) sequence of hashtags separated by [\s,:]*
')'    #Closing parenthesis of outermost grouping to match overall regex
)

Personally, I never use re.VERBOSE, I never remind the rules concerning the blanks and others

Upvotes: 2

unutbu
unutbu

Reputation: 880677

Alternatively, use [#] to add a # sign to the regex which is not intended to start a comment:

HASHTAG_SEQ_REGEX_PATTERN           = r"""
(                   #Outermost grouping to match overall regex
[#]\w+                #The hashtag matching. It's a valid combination of \w+
([:\s,]*[#]\w+)*      #This is an optional (0 or more) sequence of hashtags separated by [\s,:]*
)                   #Closing parenthesis of outermost grouping to match overall regex
"""

I find this a little more readable.

Upvotes: 0

Pavel Anossov
Pavel Anossov

Reputation: 62948

This line is commented out right after the FIRST #:

        v----comment starts here
([:\s,]*#\w+)*  ...

Escape it:

([:\s,]*\#\w+)*  

This line too, but it doesn't cause unbalanced parenthesis :)

v----escape me
#\w+                                    #The hashtag matching ... 

 

HASHTAG_SEQ_REGEX_PATTERN           = r"""
(                 # Outermost grouping to match overall regex
\#\w+             # The hashtag matching. It's a valid combination of \w+
([:\s,]*\#\w+)*   # This is an optional (0 or more) sequence of hashtags separated by [\s,:]*
)                 # Closing parenthesis of outermost grouping to match overall regex
"""

Upvotes: 5

Explosion Pills
Explosion Pills

Reputation: 191789

You have some unescaped hashes there that you want to use legitimately, but VERBOSE is screwing you up:

\#\w+
([:\s,]*\#\w+)*   #reported issue caused by this hash

Upvotes: 3

Related Questions