Reputation: 6144
I am trying to compile a regex to be able to accumulate a sequence of hashtags (r'#\w+'
) from a tweet. I want to be able to compile two regexes which can do this from starting and end ing of the tweet. I am using python 272 and my code is like this.
HASHTAG_SEQ_REGEX_PATTERN = r"""
( #Outermost grouping to match overall regex
#\w+ #The hashtag matching. It's a valid combination of \w+
([:\s,]*#\w+)* #This is an optional (0 or more) sequence of hashtags separated by [\s,:]*
) #Closing parenthesis of outermost grouping to match overall regex
"""
LEFT_HASHTAG_REGEX_SEQ = re.compile('^' + HASHTAG_SEQ_REGEX_PATTERN , re.VERBOSE | re.IGNORECASE)
When the line where I'm compiling the regex is executed, I get following error:
sre_constants.error: unbalanced parenthesis
I don't know why am I getting this, as there is no unbalanced parenthesis that I can see in my Regex Pattern.
Upvotes: 1
Views: 7548
Reputation: 27585
You wouldn't have had this problem if you'd written the pattern as folows:
HASHTAG_SEQ_REGEX_PATTERN = (
'(' #Outermost grouping to match overall regex
'#\w+' #The hashtag matching. It's a valid combination of \w+
'([:\s,]*#\w+)*' #This is an optional (0 or more) sequence of hashtags separated by [\s,:]*
')' #Closing parenthesis of outermost grouping to match overall regex
)
Personally, I never use re.VERBOSE, I never remind the rules concerning the blanks and others
Upvotes: 2
Reputation: 880677
Alternatively, use [#]
to add a #
sign to the regex which is not intended to start a comment:
HASHTAG_SEQ_REGEX_PATTERN = r"""
( #Outermost grouping to match overall regex
[#]\w+ #The hashtag matching. It's a valid combination of \w+
([:\s,]*[#]\w+)* #This is an optional (0 or more) sequence of hashtags separated by [\s,:]*
) #Closing parenthesis of outermost grouping to match overall regex
"""
I find this a little more readable.
Upvotes: 0
Reputation: 62948
This line is commented out right after the FIRST #
:
v----comment starts here
([:\s,]*#\w+)* ...
Escape it:
([:\s,]*\#\w+)*
This line too, but it doesn't cause unbalanced parenthesis :)
v----escape me
#\w+ #The hashtag matching ...
HASHTAG_SEQ_REGEX_PATTERN = r"""
( # Outermost grouping to match overall regex
\#\w+ # The hashtag matching. It's a valid combination of \w+
([:\s,]*\#\w+)* # This is an optional (0 or more) sequence of hashtags separated by [\s,:]*
) # Closing parenthesis of outermost grouping to match overall regex
"""
Upvotes: 5
Reputation: 191789
You have some unescaped hashes there that you want to use legitimately, but VERBOSE
is screwing you up:
\#\w+
([:\s,]*\#\w+)* #reported issue caused by this hash
Upvotes: 3