Ankur Agarwal
Ankur Agarwal

Reputation: 24758

Python unicode regex issue

Why does this work:

>>> ss
u'\U0001f300'
>>> r = re.compile(u"[u'\U0001F300-\U0001F5FF']+", re.UNICODE)
>>> r.search(ss) # this works
<_sre.SRE_Match object at 0x7f359acf03d8>

But this doesn't:

>>> r = re.compile("[u'\U0001F300-\U0001F5FF']+", re.UNICODE)
>>> r.search(ss) # this doesn't

Based on Ignacio's answer below, this also works:

>>> r = re.compile(u"[\U0001F300-\U0001F5FF]+", re.UNICODE)
>>> r.search(ss)
<_sre.SRE_Match object at 0x7f359acf03d8>

Upvotes: 0

Views: 76

Answers (1)

Ignacio Vazquez-Abrams
Ignacio Vazquez-Abrams

Reputation: 798606

Use a unicode pattern when performing a search on a unicode haystack.

Also, the "u'...'" should not be in the pattern; those are Unicode characters (in the unicode) without that regardless.

Upvotes: 3

Related Questions