Reputation: 677
I am trying to build a regex to match different possible combinations of negative unicode emoticons. I am having a problem matching the types of emoticons contained in the list test_2 below. Although I believe that the non-alphanumeric symbols that conform the emoticons are correctly placed inside the regex, neither the emoticons nor the left eye (captured group with the name eye1) are matched... How could I solve it? Thanks
neg_emoticon_regular = ur"""
[\((]? #optional left parenthesis
\s* #optional space
[\`\#\ ́]? #optional symbols between left parenthesis and left eye
(?P<eye1>[\ー\; \́\`\・\>Tt\ー\ ̄\−\-\゚~\_\.\>\*\/]) #left eye
\s* #optional space
[\。\。\Δ\-\人\O\0\.\Д\д\o\−\_\ω\ヘ\^\_]? #mouth
\s* #optional space
[(?P=eye1)\`\<\’] #right eye, usually will match left eye
[\A\#\;]? #optional symbols between right eye and right parenthesis
\s* #optional space
[\)\)]? #optional right parenthesis
"""
neg_emoticon_re = re.compile(neg_emoticon_regular, re.VERBOSE | re.UNICODE)
test_2 = ["(−_−#)","(-。-;","(-_-)"] #negative emoticons to match
for e in test_2:
e_uc_norm = unicodedata.normalize('NFKC', e.decode("utf-8"))
m = neg_emoticon_re.search(e_uc_norm)
if m: print "eye1:",m.group("eye1") #print the symbol that is supposed to be the left eye
print len(neg_emoticon_re.findall(e_uc_norm)), e_uc_norm
Upvotes: 2
Views: 930
Reputation: 20644
In a regex, [...]
is a set of characters, so [\((]
will match either and open parenthesis or a space (it can be shortened to [( ]
), and [\s+]?
will match an optional whitespace character or a plus sign.
Upvotes: 3