Reputation: 24758
Why does this work:
>>> ss
u'\U0001f300'
>>> r = re.compile(u"[u'\U0001F300-\U0001F5FF']+", re.UNICODE)
>>> r.search(ss) # this works
<_sre.SRE_Match object at 0x7f359acf03d8>
But this doesn't:
>>> r = re.compile("[u'\U0001F300-\U0001F5FF']+", re.UNICODE)
>>> r.search(ss) # this doesn't
Based on Ignacio's answer below, this also works:
>>> r = re.compile(u"[\U0001F300-\U0001F5FF]+", re.UNICODE)
>>> r.search(ss)
<_sre.SRE_Match object at 0x7f359acf03d8>
Upvotes: 0
Views: 76
Reputation: 798606
Use a unicode
pattern when performing a search on a unicode
haystack.
Also, the "u'...'" should not be in the pattern; those are Unicode characters (in the unicode
) without that regardless.
Upvotes: 3