beehuang
beehuang

Reputation: 369

python 3 regex slash with Square Brackets

I want to find emoji in Python 3, and I print my string is \ud83d\ude0a
And I can found it by re.compile(r'(\\ud83d\\ude0a)')
But when I want to use Square Brackets to find like \ud83d[\ude00-\ude4f]
I write this re.compile(r'(\\ud83d([\\ude00-\\ude4f]))');
but just mapping ude0a in \ud83d\ude0a.

my entire code

str = '\\ud83d\\ude0a'
print(str)
emoji_pattern = re.compile(r'(\\ud83d([\\ude00-\\ude4f]))');
# emoji_pattern = re.compile(r'(\\ud83d\\ude0a)');
print(emoji_pattern.sub(r'', str))

Upvotes: 0

Views: 205

Answers (1)

DjLegolas
DjLegolas

Reputation: 76

The problem is in the way you use square brackets.
Square brackets are used for selecting a single char from the chars in the brackets. Therefor, when you wrote [\\ude00-\\ude4f], it will be translated to only one char in there (for example, \\, u, d, 0, etc.), and not as you wanted it to be, from \ud83d\ude00 to \ud83d\ude4f.

To fix this, try using (\\ud83d(\\ude[0-4][0-9a-f])). It will find the sequence of the chars \ud83d\ude and then char in the range of 0 to 4 and then one in the sequence of 0 to 9 or a to f. As a result, this will detect the wanted sequence, and can be inspect here.

Upvotes: 1

Related Questions