SimonSchus
SimonSchus

Reputation: 65

Regex expression error: "unbalanced parenthesis" but can't find error

I'm having some trouble running the following code which is designed to check for emoticons. I keep getting an error and cannot work out how to fix the issue.

Here is the code:

import re

patterns = r"""
    (?:
      [<>]?
      [:;=8]                     # eyes
      [\-o\*\']?                 # optional nose
      [\)\]\(\[dDpP/\:\}\{@\|\\] # mouth      
      |
      [\)\]\(\[dDpP/\:\}\{@\|\\] # mouth
      [\-o\*\']?                 # optional nose
      [:;=8]                     # eyes
      [<>]?
    )
"""

regexes= [re.compile(p) for p in patterns]


text = 'hi there! my name is SimonSchus and here is an emoticon :-)'


for regex in regexes:
    print('Looking for ', regex," in ",(regex.pattern, text))

    if regex.search(text):
        print('found a match!')
    else:
        print('no match')

The error that I'm getting is

raise error("unbalanced parenthesis")
sre_constants.error: unbalanced parenthesis

Clearly there is an error somewhere with the parentheses/brackets. However, I've escaped everything I can think of with a backslash but still can't work it out. Any ideas where I'm going wrong? I feel like the error is in the regex expression itself from a bit of debugging but can't work out what exactly.

Simon.

Credit to Christopher Potts (http://sentiment.christopherpotts.net/code-data/happyfuntokenizing.py) from whom I found the emoticon expressions.

Upvotes: 0

Views: 1076

Answers (1)

Charles Duffy
Charles Duffy

Reputation: 295706

regexes= [re.compile(p) for p in patterns]

...is trying to compile each letter in the string as its own regex. Thus, when p is (, it expects (and can't find) a closing ); likewise for [ and ].


Your patterns is just one string, not a list of them. Thus:

patterns_re = re.compile(patterns)

If you wanted a list of regexes, patterns would be defined as a list: patterns=[ ... ], not patterns=r"...".

Upvotes: 3

Related Questions