Reputation: 65
I'm having some trouble running the following code which is designed to check for emoticons. I keep getting an error and cannot work out how to fix the issue.
Here is the code:
import re
patterns = r"""
(?:
[<>]?
[:;=8] # eyes
[\-o\*\']? # optional nose
[\)\]\(\[dDpP/\:\}\{@\|\\] # mouth
|
[\)\]\(\[dDpP/\:\}\{@\|\\] # mouth
[\-o\*\']? # optional nose
[:;=8] # eyes
[<>]?
)
"""
regexes= [re.compile(p) for p in patterns]
text = 'hi there! my name is SimonSchus and here is an emoticon :-)'
for regex in regexes:
print('Looking for ', regex," in ",(regex.pattern, text))
if regex.search(text):
print('found a match!')
else:
print('no match')
The error that I'm getting is
raise error("unbalanced parenthesis")
sre_constants.error: unbalanced parenthesis
Clearly there is an error somewhere with the parentheses/brackets. However, I've escaped everything I can think of with a backslash but still can't work it out. Any ideas where I'm going wrong? I feel like the error is in the regex expression itself from a bit of debugging but can't work out what exactly.
Simon.
Credit to Christopher Potts (http://sentiment.christopherpotts.net/code-data/happyfuntokenizing.py) from whom I found the emoticon expressions.
Upvotes: 0
Views: 1076
Reputation: 295706
regexes= [re.compile(p) for p in patterns]
...is trying to compile each letter in the string as its own regex. Thus, when p
is (
, it expects (and can't find) a closing )
; likewise for [
and ]
.
Your patterns
is just one string, not a list of them. Thus:
patterns_re = re.compile(patterns)
If you wanted a list of regexes, patterns
would be defined as a list: patterns=[ ... ]
, not patterns=r"..."
.
Upvotes: 3